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(57) Incorporation of certain amino acid analogs into 
polypeptides produced by cells which do not ordinarily 
provide polypeptides containing such amino acid ana- 
logs is accomplished by subjecting the cells to growth 
media containing such amino acid analogs. The degree 
of incorporation can be regulated by adjusting the con- 
centration of amino acid analogs in the media and/or by 
adjusting osmolality of the media. Such incorporation al- 
lows the chemical and physical characteristics of 



polypeptides to be altered and studied. In addition, nu- 
cleic acid and corresponding proteins including a do- 
main from a physiologically active peptide and a domain 
from an extracellular matrix protein which is capable of 
providing a self-aggregate are provided. Human extra- 
cellular matrix proteins capable of providing a self-ag- 
gregate collagen are provided which are produced by 
prokaryotic cells. Preferred codon usage is employed to 
produce extracellular matrix proteins in prokaryotics. 
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Description 
BACKGROUND 
5 1. Technical Field 

[0001] Engineered polypeptides and chimeric polypeptides having incorporated amino acids which enhance or oth- 
erwise modify properties of such polypeptides. 

10 2. Description of Related Art 

1 [ [0002] Genetic engineering allows polypeptide production to be transferred from one organism to another. In doing 
" ;■ so, a portion of the production apparatus indigenous to an original host is transplanted into a recipient. Frequently, the 
If original host has evolved certain unique processing pathways in association with polypeptide production which are not 
Ta :is contained in or transferred to the recipient. For example, it is well known that mammalian cells incorporate a complex 
} \ set of post-translational enzyme systems which impart unique characteristics to protein products of the systems. When 
| * ; a gene encoding a protein normally produced by mammalian cells is transferred into a bacterial or yeast cell, the protein 
;! may not be subjected to such post translational modification and the protein may not function as originally intended. 
. : [0003] Normally, the process of polypeptide or protein synthesis in living cells involves transcription of DNA into RNA 
20 and translation of RNA into protein. Three forms of RNA are involved in protein synthesis: messenger RNA (mRNA) 
carries genetic information to ribosomes made of ribosomal RNA (rRNA) while transfer RNA (tRNA) links to free amino 
acids in the cell pool. Amino acid/tRNA complexes line up next to codons of mRNA, with actual recognition and binding 
being mediated by tRNA. Cells can contain up to twenty amino acids which are combined and incorporated in sequences 
of varying permutations into proteins. Each amino acid is distinguished from the other nineteen amino acids and charged 
25 to tRNA by enzymes known as aminoacyl-tRNA synthetases. As a general rule, amino acid/tRNA complexes are quite 
specific and normally only a molecule with an exact stereochemical configuration is acted upon by a particular ami- 
noacyl-tRNA synthetase. 

[0004] In many living ceils some amino acids are taken up from the surrounding environment and some are synthe- 
sized within the cell from precursors, which in turn have been assimilated from outside the cell. In certain instances, 

30 a cell is auxotrophic, i.e., it requires a specific growth substance beyond the minimum required for normal metabolism 
and reproduction which it must obtain from the surrounding environment. Some auxotrophs depend upon the external 
environment to supply certain amino acids. This feature allows certain amino acid analogs to be incorporated into 
proteins produced by auxotrophs by taking advantage of relatively rare exceptions to the above rule regarding stere- 

; ochemical specificity of aminoacyl-tRNA synthetases. For example, proline is such an exception, i.e., the amino acid 

35 activating enzymes responsible for the synthesis of prolyl-tRNA complex are not as specific as others. As a conse- 
quence certain proline analogs have been incorporated into bacterial, plant, and animal cell systems. See Tan et al., 
Proline Analogues Inhibit Human Skin Fibroblast Growth and Collagen Production in Culture, Journal of Investigative 
Dermatology, 80:261-267(1983). 

;;\ [0005] A method of incorporating unnatural amino acids into proteins is described, e.g., in Noren et al., A General 
i-40 Method For Site-Specific Incorporation of Unnatural Amino Acids Into Proteins, Science, Vol. 244, pp. 182-188 (1989) 
wherein chemically acylated suppressor tRNA is used to insert an amino acid in response to a stop codon substituted 
iy. for the codon encoding residue of interest. See also, Dougherty et al., Synthesis of a Genetically Engineered Repetitive 

f ^ Polypeptide Containing Periodic Selenomethionine Residues, Macromolecules, Vol. 26, No. 7, pp. 1779-1781 (1993), 
which describes subjecting an E. coli methionine auxotroph to selenomethionine containing medium and postulates 

AS on the basis of experimental data that selenomethionine may completely replace methionine in all proteins produced 
by the cell. 

[0006] c/s-Hydroxy-L-proline has been used to study its effects on collagen by incorporation into eukaryotic cells 
such as cultured normal skin fibroblasts (see Tan et al., supra) and tendon cells from chick embryos (see e.g., Uitto et 
al., Procollagen Polypeptides Containing c/s-4-Hydroxy-L-proline are Overglycosylated and Secreted as Nonhelical 

so Pro-^Chains, Archives of Biochemistry and Biophysics, 185:1:214-221(1978)). However, investigators found that 
frans-4-hydroxyproline would not link with proline specific tRNA of prokaryotic E. coli. See Papas et al., Analysis of the 
Amino Acid Binding to the Proline Transfer Ribonucleic Acid Synthetase of Escherichia coli, Journal of Biological Chem- 
istry, 245:7:1588-1595(1970). Another unsuccessful attempt to incorporate frans-4-hydroxyproline into prokaryotes is 
described in Deming et al., In Vitro Incorporation of Proline Analogs into Artificial Proteins, Poly. Mater. Sci. Engin. 

55 Proceed., Vol. 71, p. 673-674 (1994). Deming et al. report surveying the potential for incorporation of certain proline 
analogs, i.e., L-azetidine-2-carboxylic acid, L^-thiaproline, 3,4-dehydroproline and L-frans-4-hydroxyproline into arti- 
ficial proteins expressed in E. coli cells. Only L-azetidine-2-carboxylic acid, L-y-thiaproline and 3,4 dehydroproline are 
reported as being incorporated into proteins in E. coli cells in vivo. 
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; [0007] Extracellular matrix proteins ("EMPs") are found in spaces around or near cells of multicellular organisms and 
are typically fibrous proteins of two functional types: mainly structural, e.g., collagen and elastin, and mainly adhesive, 
e.g., fibronectin and laminin. Collagens are a family of fibrous proteins typically secreted by connective tissue cells. 
Twenty distinct collagen chains have been identified which assemble to form a total of about ten different collagen 
5 molecules. A general discussion of collagen is provided by Alberts, et al., The Cell, Garland Publishing, pp. 802-823 
(1989), incorporated herein by reference. Other fibrous or filamentous proteins include Type I IF proteins, e.g., keratins; 
Type II IF proteins, e.g., vimentin, desmin and glial fibrillary acidic protein; Type III IF proteins, e.g., neurofilament 
proteins; and Type IV IF proteins, e.g., nuclear laminins. 

[0008] Type I collagen is the most abundant form of the fibrillar, interstitial collagens and is the main component of 
10 the extracellular matrix. Collagen monomers consist of about 1000 amino acid residues in a repeating array of Gly-X- 
Y triplets. Approximately 35% of the X and Y positions are occupied by proline and trans 4-hydroxyproline. Collagen 
monomers associate into triple helices which consist of one oc2 and two a1 chains. The triple helices associate into 
fibrils which are oriented into tight bundles. The bundles of collagen fibrils are further organized to form the scaffold 
for extracellular matrix. 

15 [0009] In mammalian cells, post-translational modification of collagen contributes to its ultimate chemical and phys- 
ical properties and includes proteolytic digestion of pro-regions, hydroxylation of lysine and proline, and glycosylation 
of hydroxylated lysine. The proteolytic digestion of collagen involves the cleavage of pro regions from the N and C 
termini. It is known that hydroxylation of proline is essential for the mechanical properties of collagen. Collagen with 
low levels of 4-hydroxyproline has poor mechanical properties, as highlighted by the sequelae associated with scurvy. 

20 4-hydroxyproline adds stability to the triple helix through hydrogen bonding and through restricting rotation about C-N 
bonds in the polypeptide backbone. In the absence of a stable structure, naturally occurring cellular enzymes contribute 
to degrading the collagen polypeptide. 

[0010] The structural attributes of Type I collagen along with its generally perceived biocompatability make it a de- 
sirable surgical implant material. Collagen is purified from bovine skin or tendon and used to fashion a variety of medical 
25 devices including hemostats, implantable gels, drug delivery vehicles and bone substitutes. However, when implanted 
into humans bovine collagen can cause acute and delayed immune responses. 

[0011] As a consequence, researchers have attempted to produce human recombinant collagen with all of its struc- 
tural attributes in commercial quantities through genetic engineering. Unfortunately, production of collagen by com- 
mercial mass producers of protein such as E. coli has not been successful. A major problem is the extensive post- 
30 translational modification of collagen by enzymes not present in E. coli. Failure of E. coli cells to provide proline hy- 
droxylation of unhydroxylated collagen proline prevents manufacture of structurally sound collagen in commercial quan- 
tities. 

[0012] Another problem in attempting to use E. coli to produce human collagen is thatE. coli prefer particular codons 
in the production of polypeptides. Although the genetic code is identical in both prokaryotic and eukaryotic organisms, 

35 the particular codon (of the several possible for most amino acids) that is most commonly utilized can vary widely 
between prokaryotes and eukaryotes. See, Wada, K.-N., Y. Wada, F. Ishibashi, T. Gojobori and T. Ikemura. Nucleic 
Acids Res. 20, Supplement: 2111-2118, 1992. Efficient expression of heterologous (e.g. mammalian) genes in prokary- 
otes such as E. coli can be adversely affected by the presence in the gene of codons infrequently used in E. coli and 
expression levels of the heterologous protein often rise when rare codons are replaced by more common ones. See, 

w e.g., Williams, D.P., D. Regier, D. Akiyoshi, F. Genbauffe and J.R. Murphy. Nucleic Acids Res. 16: 10453-10467, 1988 
and H66g, J.-O., H. v. Bahr-Lindstrom, H. Jflrnvall and A. Holmgren. Gene. 43: 13-21, 1986. This phenomenon is 
thought to be related, at least in part, to the observation that a low frequency of occurrence of a particular codon 
correlates with a low cellular level of the transfer RNA for that codon. See, Ikemura, T.J. Mol. Biol. 158: 573-597, 1982 
and Ikemura, T.J. Mol. Biol. 146: 1-21, 1981. Thus, the cellular tRNA level may limit the rate of translation of the codon 

45 and therefore influence the overall translation rate of the full-length protein. See, Ikemura, T.J. Mol. Biol. 146: 1-21, 
1981; Bonekamp, F. and F.K. Jensen. Nucleic Acids Res. 16: 3013-3024, 1988; Misra, R. and R Reeves, Eur. J. 
Biochem. 152: 151-155, 1985; and Post, L.E., G.D. Strycharz, M. Nomura, H. Lewis and P.P. Lewis. Proc. Natl. Acad. 
Sci. U.S.A. 76: 1697-1701 , 1979. In support of this hypothesis is the observation that the genes for abundant E. coli 
proteins generally exhibit bias towards commonly used codons that represent highly abundant tRNAs. See, Ikemura, 

so T.J. Mol. Biol. 146: 1-21, 1981; Bonekamp, F. and F.K. Jensen. Nucleic Acids Res. 16: 3013-3024, 1988; Misra, R. and 
P. Reeves, Eur. J. Biochem. 152: 151-155, 1985; and Post, L.E., G.D. Strycharz, M. Nomura, H. Lewis and P.P. Lewis. 
Proc. Natl. Acad. Sci. U.S.A. 76: 1697-1701, 1979. In addition to codon frequency, the codon context (i.e. the surround- 
ing nucleotides) can also affect expression. 

[0013] Although it would appear that substituting preferred codons for rare codons could be expected to increase 
55 expression of heterologous proteins in host organisms, such is not the case. Indeed, "it has not been possible to 
formulate general and unambiguous rules to predict whether the content of low-usage codons in a specific gene might 
adversely affect the efficiency of its expression in E. co//."See page 524 of S.C. Makrides (1996), Strategies for Achiev- 
ing High-Level Expression of Genes in Escherichia coli. Microbiological Reviews 60, 512-538. For example, in one 
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case, various gene fusions between yeast a factor and somatomedin C were made that differed only in coding se- 
quence. In these experiments, no correlation was found between codon bias and expression levels in E. coli. Ernst, 
J.R and Kawashima, E. (1988), J. Biotechnology, 7, 1-10. In another instance, it was shown that despite the higher 
frequency of optimal codons in a synthetic p-globin gene compared to the native sequence, no difference was found 

5 in the protein expression from these two constructs when they were placed behind the T7 promoter. Hernan et al. 
(1 992), Biochemistry, 31 , 861 9-8628. Conversely, there are many examples of proteins with a relatively high percentage 
of rare codons that are well expressed in E. coli. A table listing some of these examples and a general discussion can 
be found in Makoff, A.J. et al. (1989), Nucleic Acids Research, 17, 10191-10202. In one case, introduction of non- 
optimal, rare arginine codons at the 3' end of a gene actually increased the yield of expressed protein. Gursky, Y.G. 

10 and Beabealashvilli, R.Sh. (1994), Gene 148, 15-21. 

[001 4] Failure to provide post-translational modifications such as hydroxylation of proline and the presence in human 
collagen of rare codons for E. coli may be contributing to the difficulties encountered in the expression of human 
collagen genes in E. coli. 

15 SUMMARY 

[0015] A method of incorporating an amino acid analog into a polypeptide produced by a cell is provided which 
includes providing a cell selected from the group consisting of prokaryotic cell and eukaryotic cell, providing growth 
media containing at least one amino acid analog selected from the group consisting of frans-4-hydroxyproline, 3-hy- 

20 droxyproline, c/s-4-fluoro-L-proline and combinations thereof and contacting the cell with the growth media wherein 
the at least one amino acid analog is assimilated into the cell and incorporated into at least one polypeptide. 
[0016] Also provided is a method of substituting an amino acid analog of an amino acid in a polypeptide produced 
by a cell selected from the group consisting of prokaryotic cell and eukaryotic cell, which includes providing a cell 
selected from the group consisting of prokaryotic cell and eukaryotic cell, providing growth media containing at least 

25 one amino acid analog selected from the group consisting of frans-4-hydroxyproline, 3-hydroxyproline, c/s-4-fluoro-L- 
proline and combinations thereof and contacting the cell with the growth media wherein the at least one amino acid 
analog is assimilated into the cell and incorporated as a substitution for at least one naturally occurring amino acid in 
at least one polypeptide. 

[001 7] A method of controlling the amount of an amino acid analog incorporated into a polypeptide is also provided 
30 which includes providing at least a first cell selected from the group consisting of prokaryotic cell and eukaryotic cell, 
providing a first growth media containing a first predetermined amount of at least one amino acid analog selected from 
the group consisting of frans-4-hydroxyproline, 3-hydroxyproline, c/s-4-fluoro-L-proline and combinations thereof and 
contacting the first cell with the first growth media wherein a first amount of amino acid analog is assimilated into the 
first cell and incorporated into at least one polypeptide. At least a second cell selected from the group consisting of 
35 prokaryotic cell and eukaryotic cell, is also provided along with a second growth media containing a second predeter- 
mined amount of an amino acid analog selected from the group consisting of frans-4-hydroxyproline, 3-hydroxyproline, 
c/s-4-fluoro-L-proline and combinations thereof and the at least second cell is contacted with the second growth media 
wherein a second amount of amino acid analog is assimilated into the second cell and incorporated into at least one 
polypeptide. 

40 [0018] Also provided is a method of increasing stability of a recombinant polypeptide produced by a cell which in- 
cludes providing a cell selected from the group consisting of prokaryotic cell and eukaryotic cell, and providing growth 
media containing an amino acid analog selected from the group consisting of frans-4-hydroxyproline, 3-hydroxyproline, 
c/s-4-fluoro-L-proline and combinations thereof and contacting the cell with the growth media wherein the amino acid 
analog is assimilated into the cell and incorporated into a recombinant polypeptide, thereby stabilizing the polypeptide. 

45 [0019] A method of increasing uptake of an amino acid analog into a cell and causing formation of an amino acid 
analog/tRNA complex is also provided which includes providing a cell selected from the group consisting of prokaryotic 
cell and eukaryotic cell, providing hypertonic growth media containing amino acid analog selected from the group 
consisting of frans-4-hydroxyproline, 3-hydroxyproline, c/s-4-fluoro-L-proline and combinations thereof and contacting 
the cell with the hypertonic growth media wherein the amino acid analog is assimilated into the cell and incorporated 

50 into an amino acid analog/tRNA complex. In any of the other above methods, a hypertonic growth media can optionally 
be incorporated to increase uptake of an aminoacid analog into a cell. 

[0020] A composition is provided which includes a cell selected from the group consisting of prokaryotic cell and 
eukaryotic cell, and hypertonic media including an amino acid analog selected from the group consisting of trans- 
4-hydroxyproline, 3-hydroxyproline, c/s-4-fluoro-L-proline and combinations thereof. 
55 [0021] Also provided is a method of producing an Extracellular Matrix Protein (EMP) or a fragment thereof capable 
of providing a self-aggregate in a cell which does not ordinarily hydroxylate proline which includes providing a nucleic 
acid sequence encoding the EMP or fragment thereof which has been optimized for expression in the cell by substitution 
of codons preferred by the cell for naturally occurring codons not preferred by the cell, incorporating the nucleic acid 
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sequence into the cell, providing hypertonic growth media containing at least one amino acid selected from the group 
consisting of frans-4-hydroxyproline and 3-hydroxyproline, and contacting the cell with the growth media wherein the 
at least one amino acid is assimilated into the cell and incorporated into the EMP or fragment thereof, 
[0022] Nucleic acid encoding a chimeric protein is provided which includes a domain from a physiologically active 
5 peptide and a domain from an extracellular matrix protein (EMP) which is capable of providing a self-aggregate. The 
nucleic acid may be inserted into a cloning vector which can then be incorporated into a cell. 
[0023] Also provided is a chimeric protein including a domain from a physiologically active peptide and a domain 
from an extracellular matrix protein (EMP) which is capable of providing a self aggregate. 

[0024] Also provided is human collagen produced by a prokaryotic cell, the human collagen being capable of pro- 
10 viding a self aggregate. 

[0025] Also provided is nucleic acid encoding a human Extracellular Matrix Protein (EMP) wherein the codon usage 
in the nucleic acid sequence reflects preferred codon usage in a prokaryotic cell. 

BRIEF DESCRIPTION OF THE DRAWINGS 

'15 

[0026] Figure 1 is a plasmid map illustrating pMAL-c2. 

[0027] Figure 2 is a graphical representation of the concentration of intracellular hydroxyproline based upon con- 
centration of frans-4-hydroxyproline in growth culture over time. 

[0028] Figure 2A is a graphical representation of the concentration of intracellular hydroxyproline as a function of 
20 sodium chloride concentration. 

[0029] Figures 3A and 3B depict a DNA sequence encoding human Type 1 (cc.,) collagen (SEQ. ID. NO. 1). 
[0030] Figure 4 is a plasmid map illustrating pHuCol. 

[0031] Figure 5 depicts a DNA sequence encoding a fragment of human Type 1 (o^) collagen (SEQ. ID. NO.2.). 
[0032] Figure 6 is a plasmid map illustrating pHuCol-FI. 
25 [0033] Figure 7 depicts a DNA sequence encoding a collagen-like peptide wherein the region coding for gene col- 
lagen-like peptide is underlined (SEQ. ID. NO. 3). 

[0034] Figure 8 depicts an amino acid sequence of a collagen-like peptide (SEQ. ID. NO. 4). 
[0035] Figure 9 is a plasmid map illustrating pCLP. 

[0036] Figure 10 depicts a DNA sequence encoding mature bone morphogenic protein (SEQ. ID. NO. 5). 
30 [0037] Figure 11 is a plasmid map illustrating pCBC. 

[0038] Figure 1 2 is a graphical representation of the percent incorporation of proline and frans-4-hydroxyproline into 
maltose binding protein under various conditions. 

[0039] Figure 13 depicts a collagen I (ct1)/BMP-2B chimeric amino acid sequence (SEQ. ID. NO. 6). 
[0040] Figure 14A-14C depicts a collagen I (cc1)/BMP-2B chimeric nucleotide sequence (SEQ. ID. NO. 7). 
35 [0041] Figure 15 depicts a collagen I (ct1 J/TGF-p^mino acid sequence (SEQ. ID. NO. 8). 

[0042] Figure 16A-16C depict a collagen I (oc1 )fTGF-$i nucleotide sequence (SEQ. ID. NO. 9). Lower case lettering 
indicates non-coding sequence. 

[0043] Figures 17A-17B depict a collagen I (oc1)/decorin amino acid sequence (SEQ. ID. NO. 10). 
[0044] Figure 18 depicts a collagen I (a1)/decorin peptide amino acid sequence (SEQ. ID. NO. 11). 
40 [0045] Figures 19A-19D depict a collagen I (a1)/decorin nucleotide sequence (SEQ. ID. NO. 12). 

[0046] Figures 20A-20C depict a collagen/decorin peptide nucleotide sequence (SEQ. ID. NO. 13). Lower case let- 
tering indicates non-coding sequence. 

[0047] Figure 21 depicts a pMal cloning vector and polylinker cloning site. 

[0048] Figure 22 depicts a polylinker cloning site contained in the pMal cloning vector of Fig. 21 (SEQ. ID. NO. 14). 
45 [0049] Figure 23 depicts a pMal cloning vector containing a BMP/collagen nucleotide chimeric construct. 

[0050] Figure 24 depicts a pMal cloning vector containing a TGF-p^collagen nucleotide chimeric construct. 

[0051] Figure 25 depicts a pMal cloning vector containing a decorin/collagen nucleotide chimeric construct. 

[0052] Figure 26 depicts a pMal cloning vector containing a decorin peptide/collagen nucleotide chimeric construct. 

[0053] Figure 27A-27E depicts a human collagen Type I (o^) nucleotide sequence (SEQ. ID. NO. 15) and corre- 
50 sponding amino acid sequence (SEQ. ID. NO. 16). 

[0054] Figure 28 is a schematic diagram of the construction of the human collagen gene from synthetic oligonucle- 
otides. 

[0055] Figure 29 is a schematic depiction of the amino acid sequence of chimeric proteins GST-ColECol (SEQ. ID. 
NO. 17) and GST-D4 (SEQ. ID. NO. 18). 
55 [0056] Figure 30 is a Table depicting occurrence of four proline and four glycine codons in the human Collagen Type 
I (ot.|) gene with optimized codon usage (ColECol). 

[0057] Figure 31 depicts a gel reflecting expression and dependence of expression of GST-D4 on hydroxyproline. 
[0058] Figure 32 depicts a gel showing expression of GST-D4 in hypertonic media. 
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[0059] Figure 33 is a graph showing circular dichroism spectra of native and denatured D4 in neutral phosphate buffer. 
[0060] Figure 34 depicts a gel representing digestion of D4 with bovine pepsin. 

[0061] Figure 35 depicts a gel representing expression of GST-H Col and GST-ColECol under specified conditions. 
[0062] Figure 36 depicts a gel representing expression of GST-CM4 in media with or without Nad and either proline 
or hydroxyproline. 

[0063] Figure 37 depicts a gel of six hour post induction samples of GST-CM4 expressed in E. coli with varying 
concentrations of NaCI. 

[0064] Figure 38 depicts a gel of 4 hour post induction samples of GST-CM4 expressed in E. coli with constant 
amounts of hydroxyproline and varying amounts of proline. 

[0065] Figures 39A-39E depict the nucleotide (SEQ. ID. NO. 19) and amino acid (SEQ. ID. NO. 20) sequence of 
HuCol Ec , the helical region of human Type I (o^) collagen plus 17 amino terminal extra-helical amino acids and 26 
carboxy terminal extra-helical amino acids with codon usage optimized forE. coli. 

[0066] Figure 40 depicts sequence and restriction maps of synthetic oligos used to reconstruct the first 243 base 
pairs of the human Type I (o^) collagen gene with optimized E. coli codon usage. The synthetic oligos are labelled 
N1-1 (SEQ. ID. NO. 21), N1-2 (SEQ. ID. NO. 22), N1-3 (SEQ. ID. NO. 23) and N1-4 (SEQ. ID. NO. 24). 
[0067] Figure 41 depicts a plasmid map of pBSN1-1 containing a 114 base pair fragment of human collagen Type I 
(a^ with optimized E. coli codon usage. 

[0068] Figure 42 depicts the nucleotide (SEQ. ID. NO. 25) and amino acid (SEQ. ID. NO. 26) sequence of a fragment 
of human collagen Type I (o^ ) gene with optimized E. coli codon usage encoded by plasmid pBSN1-1. 
[0069] Figure 43 depicts a plasmid map of pBSN1-2 containing a 243 base pair fragment of human collagen Type I 
(0^) with optimized E. coli codon usage. 

[0070] Figure 44 depicts the nucleotide (SEQ. ID. NO. 27) and amino acid (SEQ. ID. NO. 28) sequence of a fragment 
of human collagen Type I (c^) gene with optimized E. coli codon usage encoded by plasmid pBSN1-2. 
[0071] Figure 45 depicts a plasmid map of pHuCol Ec containing human collagen Type I (0^) with optimized E. coli 
codon usage. 

[0072] Figure 46 depicts a plasmid map of pTrc N 1 -2 containing a 234 nucleotide human collagen Type I (cc, ) fragment 
with optimized E. coli codon usage. 

[0073] Figure 47 depicts a plasmid map of pN1-3 containing a 360 nucleotide human collagen Type I (0^) fragment 
with optimized E. coli codon usage. 

[0074] Figure 48 depicts a plasmid map of pD4 containing a 657 nucleotide human collagen Type I (0^) 3' fragment 
with optimized E. coli codon usage. 

[0075] Figures 49A-49E depict the nucleotide (SEQ. ID. NO. 29) and amino acid (SEQ. ID. NO. 30) sequence of a 
helical region of human Type I (ot 2 ) collagen plus 11 amino terminal extra-helical amino acids and 12 carboxy terminal 
extrahelical amino acids. 

[0076] Figures 50A-50E depict the nucleotide (SEQ. ID. NO. 31) and amino acid (SEQ. ID. NO. 32) sequence of/ 
HuCol{oc 2 ) Ec , the helical region of human Type I (oe 2 ) collagen plus 11 amino terminal extra-helical amino acids and 12 
carboxy terminal extra-helical amino acids with codon usage optimized forE. coli 

[0077] Figure 51 depicts sequence and restriction maps of synthetic oligos used to reconstruct the first 240 base 
pairs of human Type I (ct 2 ) collagen gene with optimized E. coli codon usage. The synthetic oligos are labelled N1-1 
(a2) (SEQ. ID. NO. 33), N1-2 (a2) (SEQ. ID. NO. 34), N1-3 (a2) (SEQ. ID. NO. 35) and N1-4 (a2) (SEQ. ID. NO. 36). 
[0078] Figure 52 depicts a plasmid map of pBsN1-l (a 2 ) containing a 117 base pair fragment of human collagen Type 
I (cc 2 ) with optimized E. coli codon usage. 

[0079] Figure 53 depicts a plasmid map of pBSN1-2 (a 2 ) containing a 240 base pair fragment of human collagen 
Type I (a 2 )with optimized E. coli codon usage. 

[0080] Figure 54 depicts the nucleotide (SEQ. ID. NO. 37) and amino acid (SEQ. ID. NO. 38) sequence of a fragment 
of human collagen Type I (a 2 ) gene with optimized E. coli usage encoded by plasmid pBSN1-2(a 2 ). 
[0081] Figure 55 depicts a plasmid map of pHucol(oc 2 ) Ec containing the entire human collagen Type I (a 2 ) gene with 
optimized E. coli codon usage. 

[0082] Figure 56 depicts a plasmid map of pN1-2 (a 2 ) containing a 240 base pair fragment of human collagen Type 
l(a 2 ) with optimized E. coli codon usage. 

[0083] Figure 57 depicts a gel reflecting expression of GST and TGF-(31 under specified conditions. 
[0084] Figure 58 depicts a gel reflecting expression of MBP, FN-BMP-2A, FN-TGF-J51 and FN under specified con- 
ditions. 

[0085] Figure 59 depicts a gel showing expression of GST-Coll under specified conditions. 

[0086] Figure 60 depicts a plasmid map of pGST-CM4 containing the gene for glutathione S- transferase fused to 

the gene for collagen mimetic 4. 

[0087] Figure 61 depicts the nucleotide (SEQ. ID. NO. 39) and amino acid (SEQ. ID. NO. 40) sequence of collagen 
mimetic 4. 
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[0088] Figure 62A depicts a chromatogram of the elution of hydroxy proline containing collagen mimetic 4 from a 
Poros RP2 column. The arrow indicates the peak containing hydroxyproline containing collagen mimetic 4. 
[0089] Figure 62B depicts a chromatogram of the elution of proline-containing collagen mimetic 4 from a Poros RP2 
column. The arrow indicates the peak containing proline containing collagen mimetic 4. 
5 [0090] Figure 63A depicts a chromatogram of a proline amino acid standard (250 pmol). 

[0091] Figure 63B depicts a chromatogram of a hydroxyproline amino acid standard (250 pmol). 

[0092] Figure 63C depicts an amino acid analysis chromatogram of the hydrolysis of proline containing collagen 

mimetic 4. 

[0093] Figure 63D depicts an amino acid analysis chromatogram of the hydrolysis of hydroxyproline containing col- 
to lagen mimetic 4. 

[0094] Figure 64 is a graph of OD600 versus time for cultures of E. coli JM109 (F-) grown to plateau and then 
supplemented with various amino acids. 

[0095] Figure 65 depicts a plasmid map of pcEc-ct1 containing the gene for HuCol(a1) £c . 

[0096] Figure 66 depicts a plasmid map of pcEc-a2 containing the gene for HuCol(a2) Ec 
15 [0097] Figure 67 depicts a plasmid map of pD4-a1 containing the gene for a 219 amino acid C-terminal fragment of 

Type I (oc1) human collagen with optimized E. coli codon usage fused to the gene for glutathione S-transferase. 

[0098] Figure 68 depicts a plasmid map of pD4-oc2 containing the gene for a 207 amino acid C-terminal fragment of 

Type I (a2) human collagen with optimized E. coli codon usage fused to the gene for glutathione S-transferase. 

[0099] Figure 69 depicts the predicted amino acid sequence from the DNA sequence of the first 1 3 amino acid acids 
20 of protein D4-oc1 (SEQ. ID. NO. 41) and the amino acid sequence as experimentally determined (SEQ. ID NO. 42). 

[0100] Figure 70 depicts the mass spectrum of hydroxyproline containing D4-cc1 . 

[0101] Figure 71 depicts the nucleotide sequence of a 657 nucleotide human collagen Type I (ot1)3' fragment with 
optimized E. coli codon usage designated D4 (SEQ. ID. NO. 43). 

[0102] Figure 72 depicts the amino acid sequence of a 21 9 amino acid C-terminal fragment of human collagen Type 
25 | (cc1) designed D4 (SEQ. ID. NO. 44). 

[0103] Figure 73 is a plasmid map illustrating pGEX-4T. 1 containing the gene for glutatione S-transferase. 
[0104] Figure 74 is a plasmid map illustrating pTrc-TGF containing the gene for the mature human TGF-01 polypep- 
tide. 

[0105] Figure 75 is a plasmid map illustrating pTrc-Fn containing the gene for a 70 kDa fragment of human fibronectin. 
30 [0106] Figure 76 is a plasmid map illustrating pTrc-Fn-TGF containing the gene for a fusion protein of a 70 kDA 
fragment of human fibronectin and the mature human TGF-(31 polypeptide. 

[0107] Figure 77 is a plasmid map illustrating pTrc-Fn-BMP containing the gene for a fusion protein of a 70 kDa 
fragment of human fibronectin and human bone morphogenic protein 2A. 

[0108] Figure 78 is a plasmid map illustrating pGEX-HuColl Ec containing the gene for a fusion between glutathione 
35 S-transferase and Type I (oc1) human collagen with optimized E. coii codon usage. 

[0109] Figure 79 depicts the nucleotide sequence of a 627 nucleotide human collagen Type I (a2) 3' fragment with 
optimized E. coli codon usage (SEQ. ID. N0.45). 

[0110] Figure 80 depicts the amino acid sequence of a 209 amino acid C-terminal fragment of human collagen Type 
I (oc2) (SEQ. ID. NO. 46). 

40 [0111] Figure 81 depicts the sequence of synthetic oligos used to reconstruct the first 282 base pairs of the gene for 
the carboxy terminal 219 amino acids of human Type I (a1) collagen with optimized E. coli codon usage designated 
N4-1 (SEQ. ID. NO. 47), N4-2 (SEQ. ID. NO. 48), N4-3 (SEQ. ID. NO. 49) and N4-4 (SEQ. ID. NO. 50). 



DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

45 

[0112] Prokaryotic cells and eukaryotic cells can unexpectedly be made to assimilate and incorporate frans-4-hy- 
droxyproline into proteins contrary to both Papas et at. and Deming et al„ supra. Such assimilation and incorporation 
is especially useful when the structure and function of a polypeptide depends on post translational hydroxylation of 
proline not provided by the native protein production system of a recombinant host. Thus, prokaryotic bacteria such 

50 as E. coli and eukaryotic cells such as Saccharomyces cerevisiae, Saccharomyces carlsbergensis and Schizosaccha- 
romyces pombe that ordinarily do not hydroxylate proline and additional eukaryotes such as insect cells including 
lepidopteran cell lines including Spodoptera fiugiperda, Trichoplasia ni, Heliothis virescens, Bombyx mori infected with 
a baculovirus; CHO cells, COS cells and NIH 3T3 cells which fail to adequately produce certain polypeptides whose 
structure and function depend on such hydroxylation can be made to produce polypeptides having hydroxylated pro- 

55 lines. Incorporation includes adding frans-4-hydroxyproline to a polypeptide, for example, by first changing an amino 
acid to proline, creating a new proline position that can in turn be substituted with frans-4-hydroxyproline or substituting 
a naturally occurring proline in a polypeptide with frans-4-hydroxyproline as well. 

[0113] The process of producing recombinant polypeptides in mass producing organisms is well known. Replicable 
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expression vectors such as plasmids, viruses, cosmids and artificial chromosomes are commonly used to transport 
genes encoding desired proteins from one host to another. It is contemplated that any known method of cloning a gene, 
ligating the gene into an expression vector and transforming a host cell with such expression vector can be used in 
furtherance of the present disclosure. 

5 [0114] Not only is incorporation of frans-4-hydroxyproline into polypeptides which depend upon frans-4-hydroxypro- 
line for chemical and physical properties useful in production systems which do not have the appropriate systems for 
converting proline to frans-4-hydroxyproline, but useful as well in studying the structure and function of polypeptides 
which do not normally contain frans-4-hydroxyproline. It is contemplated that the following amino acid analogs may 
also be incorporated in accordance with the present disclosure: trans-4 hydroxyproline, 3-hydroxyproline, c/s-4-fluoro- 

to L-protine and combinations thereof (hereinafter referred to as the "amino acid analogs"). Use of prokaryotes and eu- 
karyotes is desirable since they allow relatively inexpensive mass production of such polypeptides. It is contemplated 
that the amino acid analogs can be incorporated into any desired polypeptide. In a preferred embodiment the prokaryotic 
cells and eukaryotic cells are starved for proline by decreasing or eliminating the amount of proline in growth media 
prior to addition of an amino acid analog herein. 

15 [0115] Expression vectors containing the gene for maltose binding protein (MBP), e.g., see Figure 1 illustrating plas- 
mid pMAL-c2, commercially available from New England Bio-Labs, are transformed into prokaryotes such as E. coti 
proline auxotrophs or eukaryotes such as S. cerevisiae auxotrophs which depend upon externally supplied proline for 
protein synthesis and anabolism. Other preferred expression vectors for use in prokaryotes are commercially available 
plasmids which include pKK-223 (Pharmacia), pTRC (Invitrogen), pGEX (Pharmacia), pET (Novagen) and pQE (Qui- 

20 agen). It should be understood that any suitable expression vector may be utilized by those with skill in the art. 

[011 6] Substitution of the amino acid analogs for proline in protein synthesis occurs since prolyl tRNA synthetase is 
sufficiently promiscuous to allow misacylation of proline tRNA with any one of the amino acid analogs. A sufficient 
quantity, i.e., typically ranging from about .001 M to about 1.0 M, but more preferably from about .005M to about 0.5M 
of the amino acid analog(s) is added to the growth medium for the transformed cells to compete with proline in cellular 

25 uptake. After sufficient time, generally from about 3Q minutes to about 24 hours or more, the amino acid analog(s) is 
assimilated by the cell and incorporated into protein synthetic pathways. As can be seen from Figures 2 and 2A, 
intracellular concentration of frans-4-hydroxyproline increases by increasing the concentration of sodium chloride in 
the growth media. In a preferred embodiment the prokaryotic cells and/or eukaryotic cells are starved for proline by 
decreasing or eliminating the amount of proline in growth media prior to addition of an amino acid analog herein. 

30 [0117] Expression vectors containing the gene for human Type I (oc1) collagen (DNA sequence illustrated in Figures 
3 and 3A; plasmid map illustrated in Figure 4) are transformed into prokaryotic or eukaryotic proline auxotrophs which 
depend upon externally supplied proline for protein synthesis and anabolism. As above, substitution of the amino acid 
analog(s) occurs since prolyl tRNA synthetase is sufficiently promiscuous to allow misacylation of proline tRNA with 
the amino acid analog(s). The quantity of amino acid analog(s) in media given above is again applicable. 

35 [0118] Expression vectors containing DNA encoding fragments of human Type 1 (a1) collagen (e.g., DNA sequence 
illustrated in Figure 5 and plasmid map illustrated in Figure 6) are transformed into prokaryotic or eukaryotic auxotrophs 
as above. Likewise, expression vectors containing DNA encoding collagen-like polypeptide (e.g., DNA sequence illus- 
trated in Figure 7, amino acid sequence illustration in Figure 8 and plasmid map illustrated in Figure 9) can be used 
to transform prokaryotic or eukaryotic auxotrophs as above. Collagen-like peptides are those which contain at least 

w partial homology with collagen and exhibit similar chemical and physical characteristics to collagen. Thus, collagen- 
like peptides consist, e.g., of repeating arrays of Gly-X-Y triplets in which about 35% of the X and Y positions are 
occupied by proline and 4-hydroxyproline. Collagen-like peptides are interchangeably referred to herein as collagen- 
like proteins, collagen-like polypeptides, collagen mimetic polypeptides and collagen mimetic. Certain preferred colla- 
gen fragments and collagen-like peptides in accordance herewith are capable of assembling into an extracellular matrix. 

45 In both collagen fragments and collagen-like peptides as described above, substitution with amino acid analog(s) occurs 
since prolyl tRNA synthetase is sufficiently promiscuous to allow misacylation of proline tRNA with one or more of the 
amino acid analog(s). The quantity of amino acid analog(s) given above is again applicable. 
[0119] It is contemplated that any polypeptide having an extracellular matrix protein domain such as a collagen, 
collagen fragment or collagen-like peptide domain can be made to incorporate amino acid analog(s) in accordance 

so with the disclosure herein. Such polypeptides include collagen, a collagen fragment or collagen-like peptide domain 
and a domain having a region incorporating one or more physiologically active agents such as glycoproteins, proteins, 
peptides and proteoglycans. As used herein, physiologically active agents exert control over or modify existing phys- 
iologic functions in living things. Physiologically active agents include hormones, growth factors, enzymes, ligands and 
receptors. Many active domains of physiologically active agents have been defined and isolated. It is contemplated 

55 that polypeptides having a collagen, collagen fragment or collagen-like peptide domain can also have a domain incor- 
porating one or more physiologically active domains which are active fragments of such physiologically active agents. 
As used herein, physiologically active agent is meant to include entire peptides, polypeptides, proteins, glycoproteins, 
proteoglycans and active fragments of any of them. Thus, chimeric proteins are made to incorporate amino acid analog 
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(s) by transforming a prokaryotic proline auxotroph or a eukaryotic proline auxotroph with an appropriate expression 
vector and contacting the transformed auxotroph with growth media containing at least one of the amino acid analogs. 
For example, a chimeric collagen/bone morphogenic protein (BMP) construct or various chimeric collagen/growth factor 
constructs are useful in accordance herein. Such growth factors are well-known and include insulin-like growth factor, 
transforming growth factor, platelet derived growth factor and the like. Figure 10 illustrates DNA of BMP which can be 
fused to the 3' terminus of DNA encoding collagen, DNA encoding a collagen fragment or DNA encoding a collagen- 
like peptide. Figure 11 illustrates a map of plasmid pCBC containing a collagen/BMP construct. In a preferred embod- 
iment, proteins having a collagen, collagen fragment or collagen-like peptide domain assemble or aggregate to form 
an extracellular matrix which can be used as a surgical implant. The property of self-aggregation as used herein includes 
the ability to form an aggregate with the same or similar molecules or to form an aggregate with different molecules 
that share the property of aggregation to form, e.g., a double or triple helix. An example of such aggregation is the 
structure of assembled collagen matrices. 

[0120] Indeed, chimeric polypeptides which may also be referred to herein as chimeric proteins provide an integrated 
combination of a therapeutically active domain from a physiologically active agent and one or more EMP moieties. The 
EMP domain provides an integral vehicle for delivery of the therapeutically active moiety to a target site. The two 
domains are linked covalently by one or more peptide bonds contained in a linker region. As used herein, integrated 
or integral means characteristics which result from the covalent association of one or more domains of the chimeric 
proteins. The therapeutically active moieties disclosed herein are typically made of amino acids linked to form peptides, 
polypeptides, proteins, glycoproteins or proteoglycans. As used herein, peptide encompasses polypeptides and pro- 
teins. 

[0121] The inherent characteristics of EMPs are ideal for use as a vehicle for the therapeutic moiety. One such 
characteristic is the ability of the EMPs to form the self-aggregate. Examples of suitable EMPs are collagen, elastin, 
fibronectin, fibrinogen and fibrin. Fibrillar collagens (Type I, II and III) assemble into ordered polymers and often ag- 
gregate into larger bundles. Type IV collagen assembles into sheetlike meshworks. Elastin molecules form filaments 
and sheets in which the elastin molecules are highly cross-linked to one another to provide good elasticity and high 
tensile strength. The cross-linked, random-coiled structure of the fiber network allows it to stretch and recoil like a 
rubber band, Fibronectin is a large fibril forming glycoprotein, which, in one of its forms, consists of highly insoluble 
fibrils cross-linked to each other by disulfide bonds. Fibrin is an insoluble protein formed from fibrinogen by the prote- 
olytic activity of thrombin during the normal clotting of blood. 

[0122] The molecular and macromolecular morphology of the above EMPs defines networks or matrices to provide 
substratum or scaffolding in integral covalent association with the therapeutically active moiety. The networks or ma- 
trices formed by the EMP domain provide an environment particularly well suited for ingrowth of autologous cells 
involved in growth, repair and replacement of existing tissue. The integral therapeutically active moieties covalently 
bound within the networks or matrices provide maximum exposure of the active agents to their targets to elicit a desired 
response. 

[0123] Implants formed of or from the present chimeric proteins provide sustained release activity in or at a desired 
locus or target site. Since it is linked to an EMP domain, the therapeutically active domain of the present chimeric 
protein is not free to separately diffuse or otherwise be transported away from the vehicle which carries it, absent 
cleavage of peptide bonds. Consequently, chimeric proteins herein provide an effective anchor for therapeutic activity 
which allows the activity to be confined to a target location for a prolonged duration. Because the supply of therapeu- 
tically active agent does not have to be replenished as often when compared to non-sustained release dosage forms, 
smaller amounts of therapeutically active agent may be used over the course of therapy. Consequently, certain advan- 
tages provided by the present chimeric proteins are a decrease or elimination of local and systemic side effects, less 
potentiation or reduction in therapeutic activity with chronic use, and minimization of drug accumulation in body tissue 
with chronic dosing. 

[0124] Use of recombinant technology allows manufacturing of non-immunogenic chimeric proteins. The DNA en- 
coding both the therapeutically active moiety and the EMP moiety should preferably be derived from the same species 
as the patient being treated to avoid an immunogenic reaction. For example, if the patient is human, the therapeutically 
active moiety as well as the EMP moiety is preferably derived from human DNA. 

[0125] Osteogenic/EMP chimeric proteins provide biodegradable and biocompatible agents for inducing bone for- 
mation at a desired site. As stated above, in one embodiment, a BMP moiety is covalently linked with an EMP to form 
chimeric protein. The BMP moiety induces osteogenesis and the extracellular matrix protein moiety provides an integral 
substratum or scaffolding for the BMP moiety and cells which are involved in reconstruction and growth. Compositions 
containing the BMP/EMP chimeric protein provide effective sustained release delivery of the BMP moiety to desired 
target sites. The method of manufacturing such an osteogenic agent is efficient because the need for extra time con- 
suming steps as purifying EMP and then admixing it with the purified BMP are eliminated. An added advantage of the 
BMP/EMP chimeric protein results from the stability created by the covalent bond between BMP and the EMP, i.e., the 
BMP portion is not free to separately diffuse away from the EMP, thus providing a more stable therapeutic agent. 
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[0126] Bone morphogenic proteins are class identified as BMP-1 through BMP-9. A preferred osteogenic protein for 
use in human patients is human BMP-2B. A BMP-2B/coIlagen IA chimeric protein is illustrated in Fig. 13 (SEQ. ID. 
NO. 6). The protein sequence illustrated in Fig. 15 (SEQ. ID. NO. 8) includes a collagen helical domain depicted at 
amino acids 1-1057 and a mature form of BMP-2B at amino acids 1060-1169. The physical properties of the chimeric 

5 protein are dominated in part by the EMP component. In the case of a collagen moiety, a concentrated solution of 
chimeric protein will have a gelatinous consistency that allows easy handling by the medical practitioner. The EMP 
moiety acts as a sequestering agent to prevent rapid desorption of the BMP moiety from the desired site and to provide 
sustained release of BMP activity. As a result, the BMP moiety remains at the desired site and provides sustained 

y ; release of BMP activity at the desired site for a period of time necessary to effectively induce bone formation. The EMP 

10 moiety also provides a matrix which allows a patient's autologous cells, e.g., chondrocytes and the like, which are 
normally involved in osteogenesis to collect therein and form an autologous network for new tissue growth. The gelat- 
inous consistency of the chimeric protein also provides a useful and convenient therapeutic manner for immobilizing 
active BMP on a suitable vehicle or implant for delivering the BMP moiety to a site where bone growth is desired. 
[0127] The BMP moiety and the EMP moiety are optionally linked together by linker sequences of amino acids. 

15 Examples of linker sequences used are illustrated within the sequence depicted in Figs. 14A-14C (SEQ. ID. NO. 7), 
16A-16C (SEQ. ID. NO. 9), 19A-19C (SEQ. ID. NO. 12) and 20A-20C (SEQ. ID. NO. 13), and are described in more 
detail below. Linker sequences may be chosen based on particular properties which they impart to the chimeric protein. 
For example, amino acid sequences such as lle-Glu-Gly-Arg and Leu-Val-Pro-Arg are cleaved by factor XA and 
thrombin enzymes, respectively. Incorporating sequences which are cleaved by proteolytic enzymes into chimeric pro- 

20 teins herein provides cleavage at the linker site upon exposure to the appropriate enzyme and separation of the two 
domains into separate entities. It is contemplated that numerous linker sequences can be incorporated into any of the 
chimeric proteins. 

[0128] In another embodiment, a chimeric DNA construct includes a gene encoding an osteogenic protein or a frag- 
ment thereof linked to gene encoding an EMP or a fragment thereof. The gene sequence for various BMPs are known, 

25 see, e.g., U.S. Patent Nos. 4,294,753, 4,761,471, 5,106,748, 5,187,076, 5,141,905, 5,108,922, 5,116,738 and 
5,168,050, each incorporated herein by reference. A BMP-2B gene for use herein is synthesized by ligating oligonu- 
cleotides encoding a BMP protein. The oligonucleotides encoding BMP-2B are synthesized using an automated DNA 
synthesizer (Beckmen Oligo-1 000). In preferred embodiment, the nucleotide sequence encoding the BMP is maximized 
for expression in E. coli. This is accomplished by using E.coli utilization tables to translate the sequence of amino acids 

30 of the BMP into codons that are utilized most often by £. coli. Alternatively, native DNA encoding BMP isolated from 
mammals including humans may be purified and used. 

[0129] The BMP gene and the DNA sequence encoding an extracellular matrix protein are cloned by standard genetic 
engineering methods as described in Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor 
; 1989, hereby incorporated by reference. 

35 [0130] The DNA sequence corresponding to the helical and telepeptide region of collagen l(oc1) is cloned from a 
human fibroblast cell line. Two sets of polymerase chain reactions are carried out using cDNA prepared by standard 
methods from AG02261 A cells. The first pair of PCR primers include a 5' primer bearing an Xmnl linker sequence and 
a 3* primer bearing the Bsml site at nucleotide number 1722. The resulting PCR product consists of sequence from 
position 1 to 1722. The second pair of primers includes the Bsml site at 1722 and a linker sequence at the 3' end 

40 bearing a Bg1 II site. The resulting PCR product consists of sequence from position 1722 to 3196. The complete se- 
quence is assembled by standard cloning techniques. The two PCR products are ligated together at the Bsml site, and 
the combined clone is inserted into any vector with Xmnl-Bg1ll sites such as pMAL-c2 vector. 
[0131] To clone the BMP-2B gene, total cellular RNA is isolated from human osteosarcoma cells (U-20S) by the 
method described by Robert E. Farrel Jr. (Academic-Press, CA, 1993 pp. 68-69) (herein incorporated by reference). 

45 The integrity of the RNA is verified by spectrophoto metric analysis and electrophoresis through agarose gels. Typical 
yields of total RNA are 50 u.g from a 100mm confluent tissue culture dish. The RNA is used to generate cDNA by 
reverse transcription using the Superscript pre-amplification system by Gibco BRL. The cDNA is used as template for 
PCR amplification using upstream and downstream primers specific for BMP-2B (GenBank HUMBMP2B accession 
#M22490). The resulting PCR product consists of BMP-2B sequence from position 1289-1619. The PCR product is 

so resolved by electrophoresis through agarose gels, purified with gene clean (BIO 101 ) and ligated into pMal-c2 vector 
(New England Biolabs). The domain of human collagen I(a1) chain is cloned in a similar manner. However, the total 
cellular RNA is isolated from a human fibroblast cell line (AG02261 A human skin fibroblasts). 
[0132] A chimeric BMP/EM P DNA construct is obtained by ligating a synthetic BMP gene to a DNA sequence en- 
coding an EMP such as collagen, fibrinogen, fibrin, fibronectin, elastin or laminin. However, chimeric polypeptides 

55 herein are not limited to these particular proteins. Figs. 14A-14C (SEQ. ID. NO. 7) illustrate a DNA construct which 
encodes a BMP-2B/collagen l(al) chimeric protein. The coding sequence for an EMP may be ligated upstream and/or 
downstream and in-frame with a coding sequence for the BMP. The DNA encoding an EMP may be a portion of the 
gene or an entire EMP gene. Furthermore, two different EMPs may be ligated upstream and downstream from the BMP. 
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[0133] The BMP-2B/collagen l(al) chimeric protein illustrated in Figs. 14A-14C includes an Xmnl linker sequence at 
base pairs (bp) 1-19, a collagen domain (bp 20-3190), a Bglll/BamHI linker sequence (bp 3191-3196), a mature form 
of BMP2b (bp 3197-3529) and a Hindlll linker sequence (bp 3530-3535). 

[0134] Any combination of growth factor and matrix protein sequences are contemplated including repeating units, 
or multiple arrays of each segment in any order. 

[0135] Incorporation of fragments of both matrix and growth factor proteins is also contemplated. For example, in 
the case of collagen, only the helical domain may be included. Other matrix proteins have defined domains, such as 
laminin, which has EGF-like domains. In these cases, specific functionalities can be chosen to achieve desired effects. 
Moreover, it may be useful to combine domains from disparate matrix proteins, such as the helical region of collagen 
and the cell attachment regions of fibronectin. In the case of growth factors, specific segments have been shown to 
be removed from the mature protein by post translational processing. Chimeric proteins can be designed to include 
only the mature biologically active region. For example, in the case of BMP-2B only the final 110 amino acids are found 
in the active protein. 

[0136] In another embodiment, a transforming growth factor (TGF) moiety is covalently linked with an EMP to form 
a chimeric protein. The TGF moiety increases efficacy of the body's normal soft tissue repair response and also induces 
osteogenesis. Consequently, TGF/EMP chimeric proteins may be used for either or both functions. One of the funda- 
mental properties of the TGF-ps is their ability to turn on various activities that result in the synthesis of new connective 
tissue. See, Piez and Sporn eds., Transforming Growth Factor-ps Chemistry, Biology and Therapeutics, Annals of the 
New York Academy of Sciences, Vol. 593, (1990). TGF-p is known to exist in at least five different isoforms. The DNA 
sequence for Human TGF-^ is known and has been cloned. See Derynck et al., Human Transforming Growth Factor- 
Beta cDNA Sequence and Expression in Tumour Cell Lines, Nature, Vol. 316, pp. 701-705 (1985), herein incorporated 
by reference. TGF-p 2 has been isolated from bovine bone, human glioblastoma cells and porcine platelets. TGF-B 3 
has also been cloned. See ten Dijke, et al., Identification of a New Member of the Transforming Growth Factor-p Gene 
Family, Proc. Natl. Acad. Sci. (USA), Vol. 85, pp. 4715-4719 (1988) herein incorporated by reference. 
[0137] A TGF-p/EMP chimeric protein incorporates the known activities of TGF-ps and provides integral scaffolding 
or substratum of the EMP as described above to yield a composition which further provides sustained release focal 
delivery at target sites. 

[0138] The TGF-p moiety and the EMP moiety are optionally linked together by linker sequences of amino acids. 
Linker sequences may be chosen based upon particular properties which they impart to the chimeric protein. For 
example, amino acid sequences such as lle-Glu-Glyn-Arg and Leu-Val-Pro-Arg are cleaved by Factor XA and Thrombin 
enzymes, respectively. Incorporating sequences which are cleaved by proteolytic enzymes into the chimeric protein 
provides cleavage at the linker site upon exposure to the appropriate enzyme and separation of the domains into 
separate entities. Fig. 15 depicts an amino acid sequence for a TGF-p^collagen IA chimeric protein (SEQ. ID. NO. 8). 
The illustrated amino acid sequence includes the collagen domain (1-1057) and a mature form of TGF-P! (1060-1171). 
[01 39] A chimeric DNA construct includes a gene encoding TGF-Pi or a fragment thereof, or a gene encoding TGF- 
p 2 or a fragment thereof, or a gene encoding TGF-p 3 or a fragment thereof, ligated to a DNA sequence encoding an 
EMP protein such as collagen (l-IV), fibrin, fibrinogen, fibronectin, elastin or laminin. A preferred chimeric DNA construct 
combines DNA encoding TGF-p 1t a DNA linker sequence, and DNA encoding collagen IA. A chimeric DNA construct 
containing TGF-p! gene and a collagen I(a1 ) gene is shown in Figs. 16A-16C (SEQ. ID. NO. 9). The illustrated construct 
includes an Xmnl linker sequence (bp 1-19), DNA encoding a collagen domain (bp 20-3190), a Bglll linker sequence 
(bp 3191-3196), DNA encoding a mature form of TGF-Pt (3197-3535), and an Xbal linker sequence (bp 3536-3541). 
[0140] The coding sequence for EMP may be ligated upstream and/or downstream and in-frame with a coding se- 
quence for the TGFp. The DNA encoding the extracellular matrix protein may encode a portion of a fragment of the 
EMP or may encode the entire EMP. Likewise, the DNA encoding the TGF-p may be one or more fragments thereof 
or the entire gene. Furthermore, two or more different TGF-ps or two or more different EMPs may be ligated upstream 
or downstream of alternate moieties. 

[0141] In yet another embodiment, a dermatan sulfate proteoglycan moiety, also known as decorin or proteoglycan 
II, is covalently linked with an EMP to form a chimeric protein. Decorin is known to bind to type I collagen and thus 
affect fibril formation, and to inhibit the cell attachment-promoting activity of collagen and fibrinogen by binding to such 
molecules near their cell binding sites. Chimeric proteins which contain a decorin moiety act to reduce scarring of 
healing tissue. The primary structure of the core protein of decorin has been deduced from cloned cDNA. See Krusius 
et at., Primary Structure of an Extracellular Matrix Proteoglycan Core Protein-Deduced from Cloned cDNA, Proc. Natl. 
Acad. Sci. (USA), Vol. 83, pp. 7683-7687 (1986) incorporated herein by reference. 

[0142] A decorin/EMP chimeric protein incorporates the known activities of decorin and provides integral scaffolding 
or substratum of the EMP as described above to yield a composition which allows sustained release focal delivery to 
target sites. Figs. 17A-17B illustrate a decorin/collagen IA chimeric protein (SEQ. ID. NO. 10) in which the collagen 
domain includes amino acids 1 -1 057 and the decorin mature protein incudes amino acids 1 060-1 388. Fig. 1 8 illustrates 
a decorin peptide/collagen IA chimeric protein (SEQ. ID. NO. 11) in which the collagen helical domain includes amino 
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acids 1-1057 and the decorin peptide fragment includes amino acids 1060-1107. The decorin peptide fragment is 
composed of P46 to G93 of the mature form of decorin. 

[0143] Further provided is a chimeric DNA construct which includes a gene encoding decorin or one or more frag- 
ments thereof, optionally ligated via a DNA linker sequence to a DNA sequence encoding an EMP such as collagen 

5 (l-IV), fibrin, fibrinogen, fibronectin, elastin or laminin. A preferred chimeric DNA construct combines DNA encoding 
decorin, a DNA linker sequence, and DNA encoding collagen I(a1). A chimeric DNA construct containing a decorin 
gene and a collagen I(a1) gene is shown in Figs. 19A-19D (SEQ. ID. NO. 12). The illustrated construct includes an 
Xmnl linker sequence (bp 1-19), DNA encoding a collagen domain (bp 20-3190), a Bg1 II linker sequence (bp 
3191-3196), DNA encoding a mature form of decorin (bp 3197-4186) and a Pstl linker sequence. A chimeric DNA 

*o construct containing a decorin peptide gene and a collagen I(a1) gene is shown in Figs. 20A-20C (SEQ. ID. NO. 13). 
The illustrated construct includes an Xmnl linker sequence (bp 1-19), DNA encoding a collagen domain (bp 20-3190), 
a Bglll linker sequence (bp 3191-3196), DNA encoding a peptide fragment of decorin (bp 3197-3343), and a Pstl linker 
sequence (bp 3344-3349). 

[0144] The coding sequence for an EMP may be ligated upstream and/or downstream and in-frame with a coding 

is sequence for decorin. The DNA encoding the EMP may encode a portion or fragment of the EMP or may encode the 
entire EMP Likewise, the DNA encoding decorin may be a fragment thereof or the entire gene. Furthermore, two or 
more different EMPs may be ligated upstream and/or downstream from the DNA encoding decorin moiety. 
[0145] Any of the above described chimeric DNA constructs may be incorporated into a suitable cloning vector. Fig. 
21 depicts a pMal cloning vector containing a polylinker cloning site. Examples of cloning vectors are the plasmids 

20 pMat-p2 and pMal-c2 (commercially available from New England Biolabs). The desired chimeric DNA construct is 
incorporated into a polylinker sequence of the plasmid which contains certain useful restriction endonuclease sites 
which are depicted in Fig. 22 (SEQ. ID. NO. 14). The pMaI-p2 polylinker sequence has Xmnl, EcoRI, BamHI, Hindlll, 
Xbal, Sail and Pstl restriction endonuclease sites which are depicted in Fig 22. The polylinker sequence is digested 
with an appropriate restriction endonuclease and the chimeric construct is incorporated into the cloning vector by 

25 ligating it to the DNA sequences of the plasmid. The chimeric DNA construct may be joined to the plasmid by digesting 
the ends of the DNA construct and the plasmid with the same restriction endonuclease to generate "sticky ends" having 
5' phosphate and 3' hydroxyl groups which allow the DNA construct to anneal to the cloning vector. Gaps between the 
inserted DNA construct and the plasmid are then sealed with DNA ligase. Other techniques for incorporating the DNA 
construct into plasmid DNA include blunt end ligation, poly(dA.dT) tailing techniques, and the use of chemically syn- 

30 thesized linkers. An alternative method for introducing the chimeric DNA construct into a cloning vector is to incorporate 
the DNA encoding the extracellular matrix protein into a cloning vector already containing a gene encoding a thera- 
peutically active moiety. 

[0146] The cloning sites in the above-identified polylinker site allow the cDNA for the collagen I(a1 )/BMP-2B chimeric 
protein illustrated in Figs. 14A-14C (SEQ. ID. NO. 7) to be inserted between the Xmnl and the Hindlll sites. The cDNA 
35 encoding the collagen \{a^/TGF-^ protein illustrated in Figs. 16A-16C (SEQ. ID. NO. 9) is inserted between the Xmnl 
and the Xbal sites. The cDNA encoding the collagen I(a1 )/decorin protein illustrated in Figs. 19A-19D (SEQ. ID. NO. 
12) inserted between the Xmnl and the Pstl sites. The cDNA encoding the collagen l(al)/decorin peptide illustrated in 
Figs. 20A-20C (SEQ. ID. NO. 13) is inserted between the Xmnl and Pstl sites. 

[0147] Plasmids containing the chimeric DNA construct are identified by standard techniques such as gel electro- 
40 phoresis. Procedures and materials for preparation of recombinant vectors, transformation of host cells with the vectors, 
and host cell expression of polypeptides are described in Sambrook et al., Molecular Cloning: A Laboratory Manual, 
supra. Generally, prokaryotic or eukaryotic host cells may be transformed with the recombinant DNA plasmids. Trans- 
formed host cells may be located through phenotypic selection genes of the cloning vector which provide resistance 
to a particular antibiotic when the host cells are grown in a culture medium containing that antibiotic. 
45 [0148] Transformed host cells are isolated and cultured to promote expression of the chimeric protein. The chimeric 
protein may then be isolated from the culture medium and purified by various methods such as dialysis, density gradient 
centrifugation, liquid column chromatography, isoelectric precipitation, solvent fractionation, and electrophoresis. How- 
ever, purification of the chimeric protein by affinity chromatography is preferred whereby the chimeric protein is purified 
by ligating it to a binding protein and contacting it with a ligand or substrate to which the binding protein has a specific 
50 affinity. 

[0149] In order to obtain more effective expression of mammalian or human eukaryotic genes in bacteria (prokary- 
otes), the mammalian or human gene may be placed under the control of a bacterial promoter. A protein fusion and 
purification system is employed to obtain the chimeric protein. Preferably, any of the above-described chimeric DNA 
constructs is cloned into a pMal vector at a site in the vector's polylinker sequence. As a result, the chimeric DNA 
55 construct is operably fused with the malE gene of the pMal vector. The malE gene encodes maltose binding protein 
(MBP). Fig. 23 depicts a pMal cloning vector containing a BMP/collagen DNA construct. A spacer sequence coding 
for 10 asparagine residues is located between the malE sequence and the polylinker sequence. This spacer sequence 
insulates MBP from the protein of interest. Figs. 24, 25 and 26 depict pMal cloning vectors containing DNA encoding 
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collagen chimeras with TGF-p 1( decorin and a decorin peptide, respectively. The pMal vector containing any of the 
chimeric DNA constructs fused to the malE gene is transformed into E. coli. 

[0150] The E. coli is cultured in a medium which induces the bacteria to produce the maltose-binding protein fused 
to the chimeric protein. This technique utilizes the P tac promoter of the pMal vector. The MBP contains a 26 amino acid 

5 N-terminat signal sequence which directs the MBP-chimeric protein through the E. coli cytoplasmic membrane. The 
protein can then be purified from the periplasm. Alternatively, the pMal-c2 cloning vector can be used with this protein 
fusion and purification system. The pMal-c2 vector contains an exact deletion of the malE signal sequence which 
results in cytoplasmic expression of the fusion protein. A crude cell extract containing the fusion protein is prepared 
and poured over a column of amylose resin. Since MBP has an affinity for the amylose it binds to the resin. Alternatively, 

10 the column can include any substrate for which MBP has a specific affinity. Unwanted proteins present in the crude 
extract are washed through the column. The MBP fused to the chimeric protein is eluted from the column with a neutral 
buffer containing maltose or other dilute solution of a desorbing agent for displacing the hybrid polypeptide. The purified 
MBP-chimeric protein is cleaved with a protease such as factor Xa protease to cleave the MBP from the chimeric 
protein. The pMal-p2 plasmid has a sequence encoding the recognition site for protease factor Xa which cleaves after 

is the amino acid sequence Isoleucine-Glutamic acid-Glycine-Arginine of the polylinker sequence. 

[01 51] The chimeric protein is then separated from the cleaved MBP by passing the mixture over an amylose column. 
An alternative method for separating the MBP from the chimeric protein is by ion exchange chromatography. This 
system yields up to 100mg of MBP-chimeric protein per liter of culture. See Riggs, P., in Ausebel, F.M., Kingston, R. 
E., Moore, D.D., Seidman, J.G., Smith, J.A., Struhl, K. (eds.) Current Protocols in Molecular Biology, Supplement 19 

20 (16.6.1-16.6.10) (1990) Green Associates/Wiley Interscience, New York, New England Biolabs (cat # 800-65S 
9pMALc2) pMal protein fusion and purification system hereby incorporated herein by reference. (See also European 
Patent No. 286 239 herein incorporated by reference which discloses a similar method for production and purification 
of a protein such as collagen.) 

[0152] Other protein fusion and purification systems may be employed to produce chimeric proteins. Prokaryotes 
25 such as E. coli are the preferred host cells for expression of the chimeric protein. However, systems which utilize 
eukaryote host cell lines are also acceptable such as yeast, human, mouse, rat, hamster, monkey, amphibian, insect, 
algae, and plant cell lines. For example, HeLa (human epithelial), 3T3 (mouse fibroblast), CHO (Chinese hamster 
ovary), and SP 2 (mouse plasma cell) are acceptable cell lines. The particular host cells that are chosen should be 
compatible with the particular cloning vector that is chosen. 
30. [0153] Another acceptable protein expression system is the Baculovirus Expression System manufactured by Invit- 
rogen of San Diego, California. Baculoviruses form prominent crystal occlusions within the nuclei of cells they infect. 
Each crystal occlusion consists of numerous virus particles enveloped in a protein called polyhedrin. In the baculovirus 
expression system, the native gene encoding polyhedrin is substituted with a DNA construct encoding a protein or 
peptide having a desired activity. The virus then produces large amounts of protein encoded by the foreign DNA con- 
35 struct. The preferred cloning vector for use with this system is pBlueBac III (obtained from Invitrogen of San Diego, 
California). The baculovirus system utilizes the Autograph californica multiple nuclear polyhidrosis virus (ACMNPV) 
regulated polyhedrin promoter to drive expression of foreign genes. The chimeric gene, i.e., the DNA construct encoding 
the chimeric protein, is inserted into the pBlueBac III vector immediately downstream from the baculovirus polyhedrin 
promoter. 

40 [0154] The pBlueBac III transfer vector contains a B-galactosidase reporter gene which allows for identification of 
recombinant virus. The B-galactosidase gene is driven by the baculovirus ETL promoter (P^tl) which is positioned in 
opposite orientation to the polyhedrin promoter (P PH ) and the multiple cloning site of the vector. Therefore, recombinant 
virus coexpresses B-galactosidase and the chimeric gene. 

[01 55] Spodoptera frugiperda (Sf9) insect cells are then cotransfected with wild type viral DNA and the pBlueBac III 
45 vector containing the chimeric gene. Recombination sequences in the pBlueBac III vector direct the vector's integration 
into the genome of the wild type baculovirus. Homologous recombination occurs resulting in replacement of the native 
polyhedrin gene of the baculovirus with the DNA construct encoding the chimeric protein. Wild type baculovirus which 
do not contain foreign DNA express the polyhedrin protein in the nuclei of the infected insect cells. However, the 
recombinants do not produce polyhedrin protein and do not produce viral occlusions. Instead, the recombinants produce 
50 the chimeric protein. 

[0156] Alternative insect host cells for use with this expression system are Sf21 cell line derived from Spodoptera 
frugiperda and High Five cell lines derived from Trichoplusia ni. 

[0157] Other acceptable cloning vectors include phages, cosmids or artificial chromosomes. For example, bacteri- 
ophage lambda is a useful cloning vector. This phage can accept pieces of foreign DNA up to about 20,000 base pairs 
55 in length. The lambda phage genome is a linear double stranded DNA molecule with single stranded complementary 
(cohesive) ends which can hybridize with each other when inside an infected host cell. The lambda DNA is cut with a 
restriction endonuclease and the foreign DNA, e.g. the DNA to be cloned, is ligated to the phage DNA fragments. The 
resulting recombinant molecule is then packaged into infective phage particles. Host cells are infected with the phage 
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particles containing the recombinant DNA. The phage DNA replicates in the host cell to produce many copies of the 
desired DNA sequence. 

[0158] Cosmids are hybrid plasmid/bacteriophage vectors which can be used to clone DNA fragments of about 
40,000 base pairs. Cosmids are plasmids which have one or more DNA sequences called "cos" sites derived from 
5 bacteriophage lambda for packaging lambda DNA into infective phage particles. Two cosmids are ligated to the DNA 
to be cloned. The resulting molecule is packaged into infective lambda phage particles and transfected into bacteria 
host cells. When the cosmids are inside the host cell they behave like plasmids and multiply under the control of a 
plasmid origin of replication. The origin of replication is a sequence of DNA which allows a plasmid to multiply within 
a host cell, 

10. [01 59] Yeast artificial chromosome vectors are similar to plasmids but allow for the incorporation of much larger DNA 
sequences of about 400,000 base pairs. The yeast artificial chromosomes contain sequences for replication in yeast. 
The yeast artificial chromosome containing the DNA to be cloned is transformed into yeast cells where it replicates 
thereby producing many copies of the desired DNA sequence. Where phage, cosmids, or yeast artificial chromosomes 
are employed as cloning vectors, expression of the chimeric protein may be obtained by culturing host cells that have 

15 been transfected or transformed with the cloning vector in a suitable culture medium. 

[0160] Chimeric proteins disclosed herein are intended for use in treating mammals or other animals. The therapeu- 
tically active moieties described above, e.g., osteogenic agents such as BMPs, TGFs, decorin, and/or fragments of 
each of them, are all to be considered as being or having been derived from physiologically active agents for purposes 
of this description. The chimeric proteins and DNA constructs which incorporate a domain derived from one or more 

20 cellular physiologically active agents can be used for in vivo therapeutic treatment, in vitro research or for diagnostic 
purposes in general. 

[0161] When used in vivo , formulations containing the present chimeric proteins may be placed in direct contact with 
viable tissue, including bone, to induce or enhance growth, repair and/or replacement of such tissue. This may be 
accomplished by applying a chimeric protein directly to a target site during surgery. It is contemplated that minimally 
25 invasive techniques such as endoscopy are to be used to apply a chimeric protein to a desired location. Formulations 
containing the chimeric proteins disclosed herein may consist solely of one or more chimeric proteins or may also 
incorporate one or more pharmaceutically acceptable adjuvants. 

[0162] In an alternate embodiment, any of the above-described chimeric proteins may be contacted with, adhered 
to, or otherwise incorporated into an implant such as a drug delivery device or a prosthetic device. Chimeric proteins 

30 may be microencapsulated or macroencapsulated by liposomes or other membrane forming materials such as alginic 
acid derivatives prior to implantation and then implanted in the form of a pouchlike implant. The chimeric protein may 
be microencapsulated in structures in the form of spheres, aggregates of core material embedded in a continuum of 
wall material or capillary designs. Microencapsulation techniques are well known in the art and are described in the 

■ Encyclopedia of Polymer Science and Engineering, Vol. 9, pp. 724 et seq. (1980) hereby incorporated herein by ref- 

35 erence. 

[0163] Chimeric proteins may also be coated on or incorporated into medically useful materials such as meshes, 
pads, felts, dressings or prosthetic devices such as rods, pins, bone plates, artificial joints, artificial limbs or bone 
augmentation implants. The implants may, in part, be made of biocompatible materials such as glass, metal, ceramic, 
calcium phosphate or calcium carbonate based materials. Implants having biocompatible biomaterials are well known 

40 in the art and are all suitable for use herein. Implant biomaterials derived from natural sources such as protein fibers, 
polysaccharides, and treated naturally derived tissues are described in the Encyclopedia of Polymer Science and 
Engineering, Vol. 2, pp. 267 et seq. (1989) hereby incorporated herein by reference. Synthetic biocompatible polymers 
are well known in the art and are also suitable implant materials. Examples of suitable synthetic polymers include 
urethanes, olefins, terephthalates, acrylates, polyesters and the like. Other acceptable implant materials are biode- 

45 gradable hydrogels or aggregations of closely packed particles such as polymethylmethacrylate beads with a polym- 
erized hydroxyethyl methacrylate coating. See the Encyclopedia of Polymer Science and Engineering, Vol. 2, pp. 267 
et seq. (1989) hereby incorporated herein by reference. 

[0164] The chimeric protein herein provides a useful way for immobilizing or coating a physiologically active agent 
on a pharmaceutically acceptable vehicle to deliver the physiologically active agent to desired sites in viable tissue. 

so Suitable vehicles include those made of bioabsorbable polymers, biocompatible nonabsorbable polymers, lactoner 
putty and plaster of Paris, Examples of suitable bioabsorbable and biocompatible polymers include homopolymers, 
copolymers and blends of hydroxyacids such as lactide and glycolide, other absorbable polymers which may be used 
atone or in combination with hydroxyacids including dioxanones, carbonates such as trimethylene carbonate, lactones 
such as caprolactone, polyoxyalkylenes, and oxylates. See the Encyclopedia of Polymer Science and Engineering, 

55 Vol. 2, pp. 230 et seq. (1989) hereby incorporated herein by reference. 

[0165] These vehicles may be in the form of beads, particles, putty, coatings or film vehicles. Diffusional systems in 

which a core of chimeric protein is surrounded by a porous membrane layer are other acceptable vehicles. 

[0166] In another aspect, the amount of amino acid analog(s) transport into a target cell can be regulated by con- 
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trolling the tonicity of the growth media. A hypertonic growth media increases uptake of fra/?s-4-hydroxyproline into E. 
co// as illustrated in Figure 2A. All known methods of increasing osmolality of growth media are appropriate for use 
herein including addition of salts such as sodium chloride, KCi, MgC^ and the like, and sugars such as sucrose, 
glucose, maltose, etc. and polymers such as polyethylene glycol (PEG), dextran, cellulose, etc. and amino acids such 
as glycine. Increasing the osmolality of growth media results in greater intracellular concentration of amino acid analog 
(s) and a higher degree of complexation of amino acid analog(s) to tRNA. As a consequence, proteins produced by 
the cell achieve a higher degree of incorporation of amino acid analogs. Figure 1 2 illustrates percentage of incorporation 
of proline and hydroxyproline into MBP under isotonic and hypertonic media conditions in comparison to proline in 
native MBP. Thus, manipulating osmolality, in addition to adjusting concentration of amino acid analog(s) in growth 
media allows a dual-faceted approach to regulating their uptake into prokaryotic cells and eukaryotic cells as described 
above and consequent incorporation into target polypeptides. 

[0167] Any growth media can be used herein including commercially available growth media such as M9 minimal 
medium (available from Gibco Life Technologies, Inc.), LB medium, NZCYM medium, terrific broth, SOB medium and 
others that are well known in the art. 

[0168] Collagen from different tissues can contain different amounts o/7rans-4-hydroxyproline. For example, tissues 
that require greater strength such as bone contain a higher number of frans-4-hydroxyproline residues than collagen 
in tissues requiring less strength, e.g., skin. The present system provides a method of adjusting the amount of trans- 
4-hydroxyproline in collagen, collagen fragments, collagen-like peptides, and chimeric peptides having a collagen do- 
main, collagen fragment domain or collagen-like peptide domain fused to a physiologically active domain, since by 
increasing or decreasing the concentration of frans-4-hydroxyproline in growth media, the amount of frans-4-hydrox- 
yproline incorporated into such polypeptides is increased or decreased accordingly. The collagen, collagen fragments, 
collagen-like peptides and above-chimeric peptides can be expressed with predetermined levels of frans-4-hydroxy- 
proline. In this manner physical characteristics of an extracellular matrix can be adjusted based upon requirements of 
end use. Without wishing to be bound by any particular theory, it is believed that incorporation of frans-4-hydroxyproline 
into the EMP moieties herein provides a basis for self aggregation as described herein. 

[0169] In another aspect, the combination of incorporation of frans-4-hydroxyproline into collagen and fragments 
thereof using hyperosmotic media and genes which have been altered such that codon usage more closely reflects 
that found in E. coii, but retaining the amino acid sequence found in native human collagen, surprisingly resulted in 
production by E. co// of human collagen and fragments thereof which were capable of self aggregation. 
[0170] The human collagen Type I (c^ ) gene sequence (Figure 27A-27E) (SEQ. ID. NO. 15) contains a large number 
of glycine and proline codons (347 glycine and 240 proline codons) arranged in a highly repetitive manner. Table I 
below is a codon frequency tabulation for the human Type I (o^) collagen gene. Of particular note is that the GGA 
glycine codon occurs 64 times and the CCC codon for proline occurs 93 times. Both of these codons are considered 
to be rare codons in E. coii See, Sharp, P.M. and W.-H. Li. Nucleic Acids Res. 14: 7737-7749, 1986. These, and similar 
considerations for other human collagen genes are shown herein to account for the difficulty in expressing human 
collagen genes in E. coii. 
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[0171] In a first step, the sequence of the heterologous collagen gene is changed to reflect the codon bias in E. coli 
as given in codon usage tables (e.g. Ausubel et al., (1995) Current Protocols in Molecular Biology, John Wiley & Sons, 
New York, New York; Wada et al., 1992, supra). Rare E. coli codons (See, Sharp, P.M. and W.-H. Li. Nucleic Acids 
Res. 14: 7737-7749, 1986) are avoided. Second, unique restriction enzyme sites are chosen that are located approx- 
imately every 120-150 base pairs in the sequence. In certain cases this entails altering the nucleotide sequence but 
does not change the amino acid sequence. Third, oligos of approximately 80 nucleotides are synthesized such that 
when two such oligos are annealed together and extended with a DNA polymerase they reconstruct a approximately 
120-150 base pair section of the gene (Figure 28). The section of the gene encoding the very amino terminal portion 
of the protein has an initiating methionine (ATG) codon at the 5' end and a unique restriction site followed by a stop 
(TAAT) signal at the 3* end. The remaining sections have unique restriction sites at the 5' end and unique restriction 
sites followed by a TAAT stop signal the 3' end. The gene is assembled by sequential addition of each section to the 
preceding 5' section. In this manner, each successively larger section can be independently constructed and expressed. 
Figure 28 is a schematic representation of the construction of the human collagen gene starting from synthetic oligos. 
[0172] A fragment of the human 7*ype I ot1 collagen chain fused to the C-terminus of glutathione S-transferase (GST- 
D4, Fig. 29) (SEQ. ID. NO. 18) was prepared and tested for expression in E. coli strain JM109 (F - ) under conditions 
of hyperosmotic shock. The collagen fragment included the C-terminal 193 amino acids of the triple helical region and 
the 26 amino acid C-terminal telopeptide. Fig. 29 is a schematic of the amino acid sequence of the GST-ColECol (SEQ. 
ID. NO. 17) and GST-D4 (SEQ. ID. NO. 18) fusion proteins. ColECol comprises the 17 amino acid N-terminal telopep- 
tide, 338 Gly-X-Y repeating tripeptides, and the 26 amino acid C-terminal telopeptide. There is a unique methionine 
at the junction of GST and D4, followed by 64 Gly-X-Y repeats, and the 26 amino acid telopeptide. The residue (Phel99) 
in the C-terminal telopeptide of D4 where pepsin cleaves is indicated. The gene was synthesized for the collagen 
fragment from synthetic oligonucleotides designed to reflect optimal E. coli usage. Fig. 30 is a table depicting occurrence 
of the four proline and four glycine codons in the human Type I ct1 gene (HCol) and the Type I a1 gene with optimized 
E. coli codon usage (ColECol). Usage of the remaining codons in ColECol was also optimized forE. coli expression 
according to Wada et al., supra. Protein GST-D4 was efficiently expressed in JM109 (F-) in minimal media lacking 
proline but supplemented with Hyp and Nad (See Figs. 31 and 32). Expression was dependent on induction with 
isopropyl-1-thio-p-galactopyranoside (IPTG), frans-4-hydroxyproline and NaCI. At a fixed Nad concentration of 500 
mM, expression was minimal at frans-4-hydroxyproline concentrations below -20 mM while the expression level pla- 
teaued at frans-4-hydroxyproline concentrations above 40 mM. See Fig. 31 which depicts a gel showing expression 
and dependence of expression of GST-D4 on hydroxyproline. The concentration of hydroxyproline is indicated above 
each lane. Osmolyte (NaCI) was added at 500 mM in each culture and each was induced with 1 .5 mM IPTG. The arrow 
marks the position of GST-D4. Likewise, at a fixed frans-4-hydroxyproline concentration of 40 mM, NaCI concentrations 
below 300 mM resulted in little protein accumulation and expression decreased above 700-800 mM NaCI. See Fig. 32 
which depicts a gel showing expression of GST-D4 in hyperosmotic media. Lanes 2 and 3 are uninduced and induced 
samples, respectively, each without added osmolyte. The identity and quantity of osmolyte is indicated above each of 
the other lanes. 7rarjs-4-Hydroxyproline was added at 40mM in each culture and all cultures except that in lane 1 were 
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induced with 1.5 mM IPTG. The arrow marks the position of GST-D4. 

[0173] Either sucrose or KC1 can be substituted for NaC1 as the osmolyte (See Fig. 32). Thus, the osmotic shock- 
mediated intracellular accumulation of frans-4-hydroxyproline was a critical determinant of expression rather than the 
precise chemical identity of the osmolyte. Despite the large number of prolines (66) in GST-D4, its size (46 kDA), and 

5 non-optimal growth conditions, it was expressed at -10% of the total cellular protein. Expressed proteins of less than 
full-length indicative of aborted transcription, translation, or mRNA instability were not detected. 
[0174] The gene for protein D4 contains 52 proline codons. In the expression experiments reflected in Figs. 31 and 
32, it was expected that frans-4-hydroxyproline would be inserted at each of these codons resulting in a protein where 
frans-4-hydroxyproline had been substituted for all prolines. To confirm this, GST-D4 was cleaved with BrCN in 0.1 N 

10 HC1 at methionines within GST and at the unique methionine at the N-terminal end of D4, and D4 purified by reverse 
phase HPLC. Crude GST-D4 was dissolved in 0.1 M HC1 in a round bottom flask with stirring. Following addition of a 
2-10 fold molar excess of clear, crystalline BrCN, the flask was evacuated and filled with nitrogen. Cleavage was 
allowed to proceed for 24 hours, at which time the solvent was removed in vacuo. The residue was dissolved in 0.1% 
trifluoroacetic acid (TFA) and purified by reverse-phase HPLC using a Vydac C4 RP-HPLC column (10 x 250 mm, 5 

,*5 \l, 300 A) on a BioCad Sprint system (Perceptive Biosystems, Framingham, MA). D4 was eluted with a gradient of 15 
to 40% acetonitrile/0.1 % TFA over a 45 min. period. D4 eluted as a single peak at 26% acetonitrile/0.1 % TFA. Standard 
BrCN cleavage conditions (70% formic acid) resulted in extensive formylation of D4, presumably at the hydroxyl groups 
of the frans-4-hydroxyproline residues. Formylation of BrCN/formic acid-cleaved proteins had been noted before (Bea- 
vis et al., Anal. Chem., 62, 1 836 (1 990)). Amino acid analysis was carried out on a Beckman ion exchange instrument 

20 with post-column derivatization. N-terminal sequencing was performed on an Applied Biosystems sequencer equipped 
with an on-line HLPC system. Electrospray mass spectra were obtained with a VG Biotech BIO-Q quadropole analyzer 
by M-Scan, Inc. (West Chester, PA). For CD thermal melts, the temperature was raised in 0.5°C increments from 4°C 
to 85°C with a four minute equilibration between steps. Data were recorded at 221.5 nm. The thermal transition was 
calculated using the program ThermoDyne (MORE). The electrospray mass spectroscopy of this protein gave a single 

25 molecular ion corresponding to a mass of 20,807 Da. This mass is within 0.05% of that expected for D4 if it contains 
100% fra/7s-4-hydroxyproline in lieu of proline. Proline was not detected in amino acid analysis of purified D4, again 
consistent with complete substitution of frans-4-hydroxyproline for proline. To confirm further that frans-4-hydroxypro- 
line substitution had only occurred at proline codons, the N-terminal 13 amino acids of D4 was sequenced as above. 
The first 13 codons of D4 specify the protein sequence H 2 N-Gly-Pro-Pro-Gly-Leu-Ala-Gly-Pro-Pro-Gly-Glu-Ser-Gly 

30 (SEQ. ID. NO. 41). The sequence found was H 2 N-Gly-Hyp-Hyp-Gly-Leu-Ala-Gly-Hyp-Hyp-Gly-Glu-Ser-Gly (SEQ. ID. 
NO. 42), see Fig. 69. Taken together, these results indicate that frans-4-hydroxyproline (Hyp) was inserted only at 
proline codons and that the fidelity of the E. coli translational machinery was not otherwise altered by either the high 
intracellular concentration or frans-4-hydroxy proline or hyperosmotic culture conditions. 

[01 75] To determine whether D4, containing frans-4-hydroxyproline in both the X and Y positions, forms homotrimeric 

35 helices and to compare stability to native collagen, the following was noted: In neutral pH phosphate buffer, D4 exhibits 
a circular dichroism (CD) spectrum characteristic of a triple helix (See Fig. 33 and Bhatnagar et al., Circular Dichroism 
and the Conformational Analysis of Biomolecules, G.D. Fasman, Ed. Plenum Press, New York, (1996 p. 183). Fig. 33 
illustrates circular dichroism spectra of native and heat-denatured D4 in neutral phosphate buffer. HPLC-purified D4 
was dissolved in 0.1 M sodium phosphate, pH 7.0, to a final concentration of 1 mg/mL (E 280 =36 28 M -1 -cnr 1 ). The 

40 solution was incubated at 4°C for two days to allow triple helices to form prior to analysis. Spectra were obtained on 
an Aviv model 62DS spectropolarimeter (Yale University, Molecular Biophysics and Biochemistry Department). A 1 
mm path length quartz suprasil fluorimeter cell was used. Following a 10 min. incubation period at 4°C, standard 
wavelength spectra were recorded from 260 to 190 nm using 10 sec acquisition times and 0.5 nm scan steps. This 
spectrum is characterized by a negative ellipticity at 198 nm and a positive ellipticity at 221 nm. The magnitudes of 

45 both of these absorbances was greater in neutral pH buffer compared to acidic conditions. Comparable dependence 
of stability on pH has been noted for collagen-like triple helices. See, e.g., Venugopal et al., Biochemistry, 33, 7948 
(1 994). Heating at 85°C for five minutes prior to obtaining the CD spectrum decreased the magnitude of the absorbance 
at 198 nm and abolished the absorbance at 221 nm (Fig. 33). This behavior is also typical of the triple helical structure 
of collagen. See, R.S. Bhatnagar et a I., Circular Dichroism and the Conformational Analysis of Biomolecules G.D. 

so Fasman, Ed., supra. A thermal melt profile of D4 conducted as above in phosphate buffer gave a melting temperature 
of about 29°C. A fragment of the C-terminal region of the bovine Type I a1 collagen chain comparable in length to D4 
forms homotrimeric helices with a melting temperature of 26°C. (See, A. Rossi, et al., Biochemistry 35, 6048 (1996)). 
[0176] Resistance to pepsin digestion is a second commonly used indication of triple helical structure. At 4°C, the 
majority of D4 is digested rapidly by pepsin to a protein of slightly lower molecular weight. Fig. 34 is a gel illustrating 

55 the result of digestion of D4 with bovine pepsin. Purified D4 was dissolved in 0.1 M sodium phosphate, pH 7.0, to 1 .6 
u.g/uJ and incubated at 4°C for 7 days. Aliquots (10 ^l) were placed into 1 .5 ml centrifuge tubes and adjusted with water 
and 1 M acetic acid solutions to 25 uJ final volume and 200 mM final acetic acid concentration. Each tube was then 
incubated for 20 min. at the indicated temperature and pepsin (0.5 u.l of a 0.25 \ig/\i\ solution) was added to each tube 



17 



/ 



EP 0 992 586 A2 

and digestion allowed to proceed for 45 minutes. Following digestion, samples were quenched with loading buffer and 
analyzed by SDS-PAGE. However, the initial pepsin cleavage product is resistant to further digestion up to ~30°C. 
Amino terminal sequencing as above of the initial pepsin cleavage product showed that the N-terminus was identical 
to that of full-length D4. Mass spectral analysis as above of the digestion product gave a parent ion with a molecular 

5 weight consistent with cleavage in the C-terminal telopeptide on the N-terminal side of Phe1 1 9 (See Fig. 29) suggesting 
that this portion of the protein is either globular or of ill-defined structure and rapidly cleaved by pepsin while the triple 
helical region is resistant to digestion. Thus, despite global frans-4-hydroxyproline for proline substitution in both the 
X and Y positions, D4 formed triple helices of stability similar to comparably sized fragments of bovine collagen con- 
taining Hyp at the normal percentage and only in the Y position. 

10 [0177] The full-length human Type I a1 collagen chain, although more than four times the size of D4, also expressed 
as a N-terminal fusion with GST (GST-ColECot, Fig. 29) in JM109(F-) in Hyp/NaCI media. Fig. 35 is a gel depicting 
expression of GST-HCol and GST-ColECol. Trans-4-hydroxyproline was added at 40 mM and NaCI at 500 mM. Ex- 
pression was induced with 1.5 mM IPTG. The arrow marks the position of GST-ColECol. In the procedures resulting 
in the gets shown in Figs. 31, 32 and 35, five ml cultures of JM109 (F~) harboring the expression plasmid in LB media 

is containing 1 00 ^g/ml ampicillin were grown overnight. Cultures were centrifuged and the cell pellets washed twice with 
five ml of M9/Amp media (See, J. Sambrook, E.F. Fritsch, T. Maniatis, Molecular Cloning: A Laboratory Manual. (Cold 
Spring Harbor Laboratory, Cold Spring Harbor, NY, 1989)) supplemented with 0.5% glucose and 100u.g/ml of all amino 
acids except glycine and alanine which were at 200 u,g/ml and containing no proline. The cells were finally resuspended 
in five mi of the above media. Following incubation at 37°C for 30 min., hydroxy proline, osmolyte, or IPTG were added 

20 as indicated. After four hours, aliquots of the cultures were analyzed by SDS-PAGE. 

[0178] Like D4, the gene for protein ColECol was constructed from synthetic oligonucleotides designed to mimic 
codon usage in highly-expressed E. coll genes. In contrast to GST-ColECol, expression from a GST-human Type I a1 
gene fusion (pHCol) identical to GST-ColECol in coded amino acid sequence but containing the human codon distri- 
bution could not be detected in Coomassie blue-stained SDS-PAGE gels of total ceil lysates of induced JM109 (F~)/ 

25 pHCol cultures (Fig. 35). The gene for the Type I cc1 collagen polypeptide was cloned by polymerase chain reaction 
of the gene from mRNA isolated from human foreskin cells (HS27, ATCC 1634) with primers designed from the pub- 
lished gene sequence (GenBank Z74615). The 5' primer added a flanking EcoR I recognition site and the 3* primer a 
flanking Hind III recognition site. The gene was cloned into the EcoR I/Hind III site of plasmid pBSKS + (Stratagene, La 
Joila, CA), four mutations corrected using the ExSite mutagenesis kit (Stratagene, La Jolla, CA), the sequence con- 

30 firmed by dideoxy sequencing, and finally the EcoR l/Xho I fragment subcloned into plasmid pGEX-4T.1 (Pharmacia, 
Piscataway, NJ). The GST-HCol gene is expression-competent because a protein of the same molecular weight as 
GST-ColECol is detected when immunoblots of total cell lysates are probed with an anti-Type I collagen antibody. Thus, 
sequence or structural differences between the genes for ColECol and HCol are critical determinants of expression 
efficiency in E. coll. This is likely due to the codon distribution in these genes and ultimately to differences in tRNA 

35 isoacceptor levels in E. coli compared to humans. GST-ColECol, GST-D4, and GST-HCol do not accumulate in hyper- 
osmotic shock media when proline is substituted for hydroxyproline or in rich media. A possible explanation is that the 
frans-4-hydroxyproline-containing proteins may be resistant to degradation because they fold into a protease-resistant 
triple helix while the proline-containing proteins do not adopt this structure. The large number of codons non-optimal 
for E. coli found in the human gene and the instability of proline-containing collagen in E. coli may, in part, explain why 

40 expression of human collagen in E. coli has not been previously reported. 

[0179] As discussed above, collagen mimetic polypeptides, i.e., engineered polypeptides having certain composi- 
tional and structural traits in common with collagen are also provided herein. Such collagen mimetic polypeptides may 
also be made to incorporate amino acid analogs as described above. GST-CM4 consists of glutathione S-transferase 
fused to 30 repeats of a Gly-X-Y sequence. The Gly-X-Y repeating section mimics the Gly-X-Y repeating unit of human 

45 collagen and is referred to as collagen mimetic 4 or CM4 herein. Thus, the hydroxyproline-incorporating technology 
was also demonstrated to work with a protein and DNA sequence analogous to that found in human collagen. Amino 
acid analysis of purified CM4 protein express in E. coli strain JM109 (F-) under hydroxyproline-incorporating conditions 
compared to analysis of the same protein expressed under praline-incorporating conditions, demonstrates that the 
techniques herein result in essentially complete substitution of hydroxyproline for proline. The amino acid analysis was 

50 performed on CM4 protein that had been cleaved from and purified away from GST. This removes any possible am- 
biguities associated with the fusion protein. 

[0180] Expression in media containing at least about 200 mM NaCI is preferable to accumulate significant amount 
of protein containing hydroxyproline. A concentration of about 400-500 mM Nad appears to be optimal. Either KCI, 
sucrose or combinations thereof may be used in substitution of or with NaCI. However, expression in media without 
55 an added osmolyte (i.e. under conditions that more closely mimic those of Deming et al., In Vivo Incorporation of Proline 
Analogs into Artificial Protein, Poly. Mater. Sci. Engin. Proceed., supra.) did not result in significant expression of hy- 
droxyproline-containing proteins in JM109 (F-). This is illustrated in Figure 36 which is a scan of a SDS-PAGE gel 
showing the expression of GST-CM4 in media with or without 500 mM Nad and containing either proline or hydroxy- 
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proline. The SDS-PAGE gel reflects 5 hour post-induction samples of GST-CM4 expressed in JM109 (F-). Equivalent 
amounts, based on OD600nm, of each culture were loaded in each lane. Gels were stained with Coomasie Blue, 
destained, and scanned on a PDI 420oe scanner. Lane 1: 2.5mM proline/OmM NaCI. Lane 2: 2.5mM proline/500mM 
NaCI. Lane 3: 80mM hydroxyproline/OmM NaCI. Lane 4: 80mM hydroxy proline/500mM NaCI. Lane 5: Molecular weight 

5 markers. The lower arrow indicates the migration position of proline-containing GST-CM4 in lanes 1 and 2. The upper 
arrow indicates the migration position of hydroxyproline-containing GST-CM4 in lanes 3 and 4. Note that GST-CM4 
expressed in the presence of hydroxyproline runs at a higher apparent molecular weight (compare lanes 1 and 4). This 
is expected since hydroxyproline is of greater molecular weight than proline. If all the prolines in GST-CM4 are substi- 
tuted with hydroxyproline, the increase in molecular weight is 671 Da (+2%). Note also that protein expressed in the 

10 presence of proline accumulates in cultures irrespective of the NaCI concentration (compare lanes 1 and 2). In contrast, 
significant expression in the presence of hydroxyproline only occurs in the culture containing 500 mM NaCI (compare 
lanes 3 and 4). Figure 37 further illustrates the dependence of expression on Nad concentration by showing that 
significant expression of GST-CM4 occurs only at Nad concentration greater than 200 mM. The SDS-PAGE gel reflects 
6 hour post-induction samples of GST-CM4 expressed in JM109 (F-) with varying concentrations of NaCI. All cultures 

15 contained 80 mM hydroxyproline. Lane 1 : 500 mM NaCI, not induced. Lanes 2-6: 500 mM, 400 mM, 300 mM, 200 mM, 
and 100 mM NaCI, respectively. All induced with 1.5 mM IPTG. Lane 7: Molecular weight markers. The arrow indicates 
the migration position of hydroxyproline-containing GST-CM4. Figure 38 is a scan of an SDS-PAGE gel of expression 
of GST-CM4 in either 400 mM NaCI or 800 mM sucrose. The SDS-PAGE gel reflects 4 hour post-induction samples 
of GST-CM4 expressed in JM109 (F-). All cultures contained 80 mM hydroxyproline and all, except that electrophoresed 

20 in lane 2, contained 400 mM NaCI. Lane 2 demonstrates expression in sucrose in lieu of NaCI. Lane 1: Molecular 
weight markers. Lane 2: 800 mM sucrose (no NaCI). Lanes 3-9: 0 mM, 0.025 mM, 0.1 mM, 0.4 mM, 0.8 mM, 1.25 mM, 
2.5 mM proline, respectively. The upper arrow indicates the migration position of hydroxyproline-containing GST-CM4 
and the lower arrow indicates the migration position of proline-containing GST-CM4. Expression is apparent in both 
cases (compare lanes 2 and 3). 

25 [0181] If expression of GST-CM4, as described in Example 17 below, is performed in varying ratios of hydroxyproline 
and proline the expressed protein appears to contain varying amounts of hydroxyproline. Thus, if only hydroxyproline 
is present during expression, a single expressed protein of the expected molecular weight is evident on a SDS-PAGE 
gel (Figure 38, lane 3). If greater than approximately 1 mM proline is present, again a single expressed protein is 
evident, but at a lower apparent molecular weight, as expected for the protein containing only proline (Figure 38, lanes 

30 7-9). If lesser amount of proline are used during expression , species of apparent molecular weight intermediate between 
these extremes are evident. This phenomenon, evident as a "smear" or "ladder" of proteins running between the two 
molecular weight extremes on an SDS-PAGE gel, is illustrated in lanes 3-9 of Figure 38. Lanes 3-9 on this gel are 

; proteins from expression in a fixed concentration of 80 mM hydroxyproline and 400 mM NaCI. However, in moving 
from lane 3 to 9 the proline concentration increases from none (lane 3) to 2.5 mM (lane 9) and expression shifts from 

35 a protein of higher molecular weight (hydroxyproline-containing GST-CM4) to lower molecular weight (proline-contain- 
ing GST-CM4). At proline concentrations of 0.025 mM and 0.1 mM, species of intermediate molecular weight are 
apparent (lanes 4 and 5). This clearly demonstrates that the percent incorporation of hydroxyproline in an expressed 
protein can be controlled by expression in varying ratios of analogue to amino acid. 

[0182] Proline starvation prior to hydroxyproline incorporation is an important technique used herein. It insures that 
40 no residual proline is present during expression to compete with hydroxyproline. This enables essentially 100% sub- 
stitution with the analogue. As shown in Figure 38, starvation conditions allow expression under precisely controlled 
ratios of proline and hydroxyproline. The amount of hydroxyproline vs. proline incorporated into the recombinant protein 
can therefore be controlled. Thus, particular properties of the recombinant protein that depend upon the relative amount 
of analogue incorporated can be tailored by the present methodology to produce polypeptides with unique and bene- 
45 ficial properties. 

[0183] Human collagen, collagen fragments, collagen-like peptides (collagen mimetics) and the above chimeric 
polypeptides produced by recombinant processes have distinct advantages over collagen and its derivatives obtained 
from non-human animals. Since the human gene is used, the collagen will not act as a xenograft in the context of a 
medical implant. Moreover, unlike naturally occurring collagen, the extent of proline hydroxylation can be predeter- 
so mined. This unprecedented degree of control permits detailed investigation of the contribution of frans-4-hydroxyproline 
to triple helix stabilization, fibril formation and biological activity. In addition, design of medical implants based upon 
the desired strength of collagen fibrils is enabled. 

[0184] The following examples are included for purposes of illustration and are not to be construed as limitations 
herein. 

55- 
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EXAMPLE 1 

Trans-membrane Transport 

5 [0185] A 5 ml_ culture of E. colt strain DH5ct (supEA4 A/acU169 (<|>80/acZ AM15) hsdRM recAl endAI gyrA96 tfw-1 
re//\1 ) containing a plasmid conferring resistance to ampicillin (pMAL-c2, Fig. 1 ) was grown in Luria Broth to confluency 
(-16 hours from inoculation). These cells were used to inoculate a 1 L shaker flask containing 500 mL of M9 minimal 
medium <M9 salts, 2% glucose, 0.01 mg/mL thiamine, 100 u.g/m!_ ampicillin supplemented with all amino acids at 20 
u.g/mL) which was grown to an AU 600 of 1 .0 (18-20 hours). The culture was divided in half and the cells harvested by 

10 centrifugation. The cells from one culture, were resuspended in 250 mL M9 media and those from the other in 250 mL 
of M9 media containing 0.5M NaCI. The cultures were equilibrated in an air shaker for 20 minutes at 37 °C (225 rpm) 
and divided into ten 25 mL aliquots. The cultures were returned to the shaker and 125 uJ of 1M hydroxyproline in 
distilled H 2 0 was added to each tube. At 2, 4, 8, 12, and 20 minutes, 4 culture tubes (2 isotonic, 2 hypertonic) were 
vacuum filtered onto 1 u.m polycarbonate filters that were immediately placed into 2 mL microfuge tubes containing 

is. 1 .2 mL of 0.2M NaOH/2% SDS in distilled H 2 0. After overnight lysis, the filters were carefully removed from the tubes, 
and the supernatant buffer was assayed for hydroxyproline according to the method of Grant, Journal of Clinical Pa- 
thology, 17:685 (1964). The intracellular concentration of frans-4-hydroxyproline versus time is illustrated graphically 
in Figure 2. 

20 EXAMPLE 2 

Effects of Salt Concentration on Transmembrane Transport 

[0186] To determine the effects of salt concentration on transmembrane transport, an approach similar to Example 
25 1 was taken. A 5 mL culture of coli strain DH5ct (supE44 A/acU169 ($Q0lacZ AM15) hsdR17 recA*\ ental gyr>496 thi-1 
reM1 ) containing a plasmid conferring resistance to ampicillin (pMAL-c2, Fig. 1 ) was grown in Luria Broth to confluency 
(-16 hours from inoculation). These cells were used to inoculate a 1 L shaker flask containing 500 mL of M9 minimal 
medium (M9 salts, 2% glucose, 0.01 mg/mL thiamine, 100 |ig/ml_ ampicillin supplemented with all amino acids at 20 
u,g/mL) that was then grown to an AU 600 of 0.6. The culture was divided into three equal parts, the cells in each collected 
30 by centrifugation and resuspended in 150 mL M9 media, 150 mL M9 media containing 0.5M NaCI, and 150 mL M9 
media containing 1 .0M NaCI, respectively. The cultures were equilibrated for 20 minutes on a shaker at 37° C (225rpm) 
and then divided into six 25 mL aliquots. The cultures were returned to the shaker and 125 \iL of 1M hydroxyproline 
in distilled H 2 0 was added to each tube. At 5 and 15 minutes, 9 culture tubes (3 isotonic, 3 x 0.5M NaCI, and 3x1 .0M 
NaCI) were vacuum filtered onto 1 ^im polycarbonate filters that were immediately placed into 2 mL microfuge tubes 
35. containing 1 .2 mL of 0.2M NaOH/2% SDS in distilled H 2 0. After overnight lysis, the filters were removed from the tubes 
and the supernatant buffer assayed for hydroxyproline according to the method of Grant, supra. 

EXAMPLE 2A 

to Effects of Salt Concentration on Transmembrane Transport 

[0187] To determine the effects of salt concentration on transmembrane transport, an approach similar to Example 
1 was taken. A saturated culture of JM109 (F-) harboring plasmid pD4 (Fig. 48) growing in Luria Broth (LB) containing 
1 00u.g/ml ampicillin (Amp) was used to inoculate 20 ml cultures of LB/Amp to an OD at 600 nm of 0. 1 AU. The cultures 

45. were grown with shaking at 37°C to an OD 600 nm between 0.7 and 1.0 AU. Cells were collected by centrifugation 
and washed with 10 ml of M9 media. Each cell pellet was resuspended in 20 ml of M9/Amp media supplemented with 
0.5% glucose and 100u.g/ml of all of the amino acids except proline. Cultures were grown at 37°C for 30 min. to deplete 
endogenous proline. After out-growth, Nad was added to the indicated concentration, Hyp was added to 40mM, and 
IPTG to 1.5mM. After 3 hours at 37°C, cells from three 5 ml aliquots of each culture were collected separately on 

50 polycarbonate filters and washed twice with five ml of M9 media containing 0.5% glucose and the appropriate concen- 
tration of NaCI. Cells were lysed in 1 ml of 70% ethanol by vortexing for 30 min. at room temperature. Celt lysis super- 
natants were taken to dryness, resuspended in 100u.l of 2.5 N NaOH, and assayed for Hyp by the method of Neuman 
and Logan, R.E. Neuman and M.A. Logan, Journal of Biological Chemistry, 184:299 (1950). Total protein was deter- 
mined with the BCA kit (Pierce, Rockford II) after cell lysis by three sonication/freeze-thaw cycles. The data are the 

55 mean ± standard error of three separate experiments. The intracellular concentration of frans-4-hydroxyproline versus 
NaCI concentration is illustrated graphically in Figure 2A. 
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EXAMPLE 3 

Determination Of Proline Starvation Conditions in E. Coli 

5 [0188] Proline auxotrophic E. coli strain NM519 (pro-) including plasmid pMAL-c2 which confers ampicillin resistance 
was grown in M9 minimal medium (M9 salts, 2% glucose, 0.01 mg/mL thiamine, 100u,g mL ampicillin supplemented 
with all amino acids at 20 u.g/mL except proline which was supplemented at 12.5 mg/L) to a constant AU 60 o of 0.53 
AU (17 hours post-inoculation). Hydroxyproline was added to 0.08M and hydroxyproline-dependent growth was dem- 
onstrated by the increase in the OD 600 to 0.61 AU over a one hour period. 

10 

EXAMPLE 4 

Hydroxyproline Incorporation Into Protein in E. coli Under Proline Starvation Conditions 

15 [01 89] Plasmid pMAL-c2 (commercially available from New England Biolabs) containing DNA encoding for maltose- 
binding protein (MBP) was used to transform proline auxotrophic E. coli strain NM51 9 (pro - ). Two 1 L cultures of trans- 
formed NM519 (pro~) in M9 minimal medium (M9 salts, 2% glucose, 0.01 mg/mL thiamine, 100 u.g/mL ampicillin sup- 
plemented with all amino acids at 20 ng/mL except proline which was supplemented at 12.5 mg/L) were grown to an 
AU 600 Of 0.53 (-17 hours post-inoculation). The cells were harvested by centrifugation, the media in one culture was 

20 replaced with an equal volume of M9 media containing 0.08M hydroxyproline and the media in the second culture was 
replaced with an equal volume of M9 media containing 0.08M hydroxyproline and 0.5M NaCI. After a one hour equili- 
bration, the cultures were induced with 1mM isopropyl-p-D-thiogalactopyranoside. After growing for an additional 3.25 
hours, cells were harvested by centrifugation, resuspended in 10 mL of 10mM Tris-HCI (pH 8), 1mM EDTA, 100mM 
NaCI (TEN buffer), and lysed by freezing and sonication. MBP was purified by passing the lysates over 4 mL amylose 

25 resin spin columns, washing the columns with 10 mL of TEN buffer, followed by elution of bound MBP with 2 mL of 
TEN buffer containing 10mM maltose. Eluted samples were sealed in ampules under nitrogen with an equal volume 
of concentrated HCI (11 .7M) and hydrolysed for 12 hours at 120 °C. After clarification with activated charcoal, hydrox- 
yproline content in the samples was determined by HPLC and the method of Grant, supra. The percent incorporation 
of frans-4-hydroxyproline compared to proline into MBP is shown graphically in Figure 12. 

30 

EXAMPLE 5 

* 

Hydroxyproline Incorporation Into Protein in S. cerevisiae via Integrating Vectors Under Proline Starvation Conditions 

35 [0190] The procedure described in Example 4 above is performed in yeast using an integrating vector which disrupts 
the proline biosynthetic pathway. A gene encoding human Type 1(0^) collagen is inserted into a unique shuttle vector 
behind the inducible GAL10 promoter. This promoter/gene cassette is flanked by a 5' and 3' terminal sequence derived 
from a S. cerevisiae proline synthetase gene. The plasmid is linearized by restriction digestion in both the 5' and 3' 
terminal regions and used to transform a proline-prototrophic S. cerevisiae strain. The transformation mixture is plated 

40 onto selectable media and transformants are selected. By homologous recombination and gene disruption, the con- 
struct simultaneously forms a stable integration and converts the S. cerevisiae strain into a proline auxotroph. A single 
transformant is selected and grown at 30 °C in YPD media to an OD 600 of 2 AU. The culture is centrifuged and the 
cells resuspended in yeast dropout media supplemented with all amino acids except proline and grown to a constant 
OD 600 indicating proline starvation conditions. 0.08M L-hydroxyproline and 2% (w/v) galactose is then added. Cultures 

45 are grown for an additional 6-48 hours. Cells are harvested by centrifugation (5000 rpm, 10 minutes) and lysed by 
mechanical disruption. Hydroxyproline-containing human Type 1(0^) collagen is purified by ammonium sulfate frac- 
tionation and column chromatography. 

EXAMPLE 6 

50 

Hydroxyproline Incorporation Into Protein in S. cerevisiae via Non-Integrating Vectors Under Proline Starvation 
Conditions 

[01 91] The procedure described above in Example 4 is performed in a yeast proline auxotroph using a non-integrating 
55 vector. A gene encoding human Type 1 (o^) collagen is inserted behind the inducible GAL 10 promoter in the YEp24 
shuttle vector that contains the selectable Ura + marker. The resulting plasmid is transformed into proline auxotrophic 
S. cerevisiae by spheroplast transformation. The transformation mixture is plated on selectable media and transform- 
ants are selected. A single transformant is grown at 30 °C in YPD media to an OD 600 of 2 AU. The culture is centrifuged 
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and the cells resuspended in yeast dropout media supplemented with all amino acids except proline and grown to a 
constant OD 600 indicating proline starvation conditions. 0.08M L-hydroxyproline and 2% (w/v) galactose is then added. 
Cultures are grown for an additional 6-48 hours. Cells are harvested by centrifugation (5000 rpm, 10 minutes) and 
lysed by mechanical disruption. Hydroxyproline-containing human Type 1 (c^ ) collagen is purified by ammonium sulfate 
5 fractionation and column chromatography. 

EXAMPLE 7 

Hydroxyproline Incorporation Into Protein in a Baculovirus Expression System 

10 

[0192] A gene encoding human Type 1(0^) collagen is inserted into the pBacPAK8 baculovirus expression vector 
behind the AcMNPV polyhedron promoter. This construct is co-transfected into SF9 cells along with linearized AcMNPV 
DNA by standard calcium phosphate co-precipitation. Transfectants are cultured for 4 days at 27 °C in TNM-FH media 
supplemented with 10 % FBS. The media is harvested and recombinant virus particles are isolated by a plaque assay. 
15 Recombinant virus is used to infect 1 liter of SF9 cells growing in Grace's media minus proline supplemented with 1 0% 
FBS and 0.08 M hydroxyproline. After growth at 27 °C for 2-10 days, cells are harvested by centrifugation and lysed 
by mechanical disruption. 

Hydroxyproline-containing human Type 1 (a-,) collagen is purified by ammonium sulfate fractionation and column chro- 
matography. 

20 

EXAMPLE 8 

Hydroxyproline Incorporation Into Human Collagen Protein in Escherichia coli Under Proline Starvation Conditions 

25 [0193] A plasmid (pHuCol, Fig. 4) encoding the gene sequence of human Type I (c^) collagen (Figures 3A and 3B) 
(SEQ. ID. NO. 1) placed behind the isopropyl-p-D-thiogalactopyranoside (IPTG)-inducible tac promotor and also en- 
coding p-lactamase is transformed into Escherichia coli proline auxotrophic strain NM51 9 (pre) by standard heat shock 
transformation. Transformation cultures are plated on Luria Broth (LB) containing 1 00 u.g/ml ampicillin and after over- 
night growth a single ampicillin-resistant colony is used to inoculate 5 ml of LB containing 100 u.g/ml ampicillin. After 

30 growth for 10-16 hours with shaking (225 rpm) at 37 °C, this culture is used to inoculate 1 L of M9 minimal medium 
(M9 salts, 2% glucose, 0.01 mg/mL thiamine, 100 ug/mL ampicillin, supplemented with all amino acids at 20 u.g/mL 
except proline which is supplemented at 12.5 mg/L) in a 1.5 L shaker flask. After growth at 37 °C, 225 rpm, for 15-20 
hours post-inoculation, the optical density at 600 nm is constant at approximately 0.5 OD/mL. The cells are harvested 
by centrifugation (5000 rpm, 5 minutes), the media decanted, and the cells resuspended in 1 L of M9 minimal media 

35 containing 100 ug/mL ampicillin, 0.08M L-hydroxyproline, and 0.5M NaCI. Following growth for 1 hour at 37 °C, 225 
rpm, IPTG is added to 1mM and the cultures allowed to grow for an additional 5-15 hours. Cells are harvested by 
centrifugation (5000 rpm, 10 minutes) and lysed by mechanical disruption. Hydroxyproline-containing collagen is pu- 
rified by ammonium sulfate fractionation and column chromatography. 

40 EXAMPLE 9 

Hydroxyproline Incorporation Into Fragments of Human Collagen Protein in Escherichia coli Under Proline Starvation 
Conditions 

45 [0194] A plasmid (pHuCol-FI, Figure 6) encoding the gene sequence of the first 80 amino acids of human Type 1 
(a,) collagen (Figure 5) (SEQ. ID. NO. 2) placed behind the isopropyl-p-D-thiogalactopyranoside (IPTG)-inducible tac 
promotor and also encoding p-lactamase is transformed into Escherichia coli proline auxotrophic strain NM519 {pro') 
by standard heat shock transformation. Transformation cultures are plated on Luria Broth (LB) containing 100^g/mL 
ampicillin and after overnight growth a single ampicillin-resistant colony is used to inoculate 5 mL of LB containing 100 

so ug/mL ampicillin. After growth for 10-16 hours with shaking (225 rpm) at 37 °C, this culture is used to inoculate 1 L of 
M9 minimal medium (M9 salts, 2% glucose, 0.01 mg/mL thiamine, 100 u.g/mL ampicillin, supplemented with all amino 
acids at 20 jxg/mL except proline which is supplemented at 12.5 mg/L) in a 1.5 L shaker flask. After growth at 37 °C, 
225 rpm, for 1 5-20 hours post-inoculation, the optical density at 600 nm is constant at approximately 0.5 OD/mL. The 
cells are harvested by centrifugation (5000 rpm, 5 minutes), the media decanted, and the cells resuspended in 1 L of 

55 M9 minimal media containing 100 u.g/mL ampicillin, 0.08M L-hydroxyproline, and 0.5M NaCI. Following growth for 1 
hour at 37 °C, 225 rpm, IPTG is added to 1mM and the cultures allowed to grow for an additional 5-15 hours. Cells 
are harvested by centrifugation (5000 rpm, 10 minutes) and lysed by mechanical disruption. The hydroxyproline-con- 

v taining collagen fragment is purified by ammonium sulfate fractionation and column chromatography. 
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EXAMPLE 10 

Construction and Expression in E. coli of the Human Collagen Type 1(0^) Gene with Optimized E. coli Codon Usage 
A. Construction of the gene: 

[0195] The nucleotide sequence of the helical region of human collagen Type I (o^) gene flanked by 17 amino acids 
of the amino terminal extra-helical and 26 amino acids of the C-terminal extra-helical region is shown in Figure 27 
(SEQ. ID. NO. 15). A tabulation of the codon frequency of this gene is given in Table I. The gene sequence shown in 
Figure 27 was first changed to reflect £. coli codon bias. An initiating methionine was inserted at the 5* end of the gene 
and a TAAT stop sequence at the 3' end. Unique restriction sites were identified or created approximately every 150 
base pairs. The resulting gene (HUCol EC , Figure 39A-39E) (SEQ. ID. NO. 20) has the codon usage given in Table II 
as shown below. Other sequences that approximate E. coli codon bias are also acceptable. 



TABLE II 



Codon 


Count 


%age 


Codon 


Count 


%age 


Codon 


Count 


%age 


Codon 


Count 


%age 


TTT- 


6 


0.56 


TCT- 


3 


0.28 


TAT- 


2 


0.18 


TGT- 


0 


0.00 


Phe 






Ser 






Tyr 






Cys 






TTC- 


9 


0.85 


TCC- 


3 


0.28 


TAC- 


2 


0.18 


TGC- 


0 


0.00 


Phe 






Ser 






Tyr 






Cys 






TTA- 


0 


0.00 


TCA- 


0 


0.00 


TAA- 


0 


0.00 


TGA-*** 


0 


0.00 


Leu 






Ser 






*** 












TTG- 


0 


0.00 


TCG- 


0 


0.00 


TAG- 


0 


0.00 


TGG- 


0 


0.00 


Leu 






Ser 






*** 






Trp 






CTT- 


0 


0.00 


CCT- 


13 


1.22 


CAT- 


0 


0.00 


CGT- 


26 


2.45 


Leu 






Pro 






His 






Arg 






CTC- 


1 


0.09 


CCC- 


12 


1.13 


CAC- 


3 


0.28 


CGC- 


26 


2.45 


Leu 






Pro 






His 






Arg 






CTA- 


1 


0.09 


CCA- 


29 


2.74 


CAA- 


5 


0.47 


CGA- 


0 


0.00 


Leu 






Pro 






Gln 






Arg 






CTG- 


19 


1.79 


CCG- 


186 


17.58 


CAG- 


25 


2.36 


CGG- 


1 


0.09 


Leu 






Pro 






Gln 






Arg 






ATT- 


3 


0.28 


ACT- 


2 


0.18 


AAT- 


0 


0.00 


AGT- 


1 


0.09 


lle 






Thr 






Asn 






Ser 






ATC- 


4 


0.37 


ACC- 


11 


1.03 


AAC- 


11 


1.03 


AGC- 


32 


3.02 


lle 






Thr 






Asn 






Ser 






ATA- 


0 


0.00 


ACA- 


0 


0.00 


AAA- 


38 


3.59 


AGA- 


0 


0.00 


lle 






Thr 






Lys 






Arg 






ATG- 


8 


0.75 


ACG- 


4 


0.37 


AAG- 


0 


0.00 


AGG- 


0 


0.00 


Met 






Thr 






Lys 






Arg 






GTT- 


3 


0.28 


GCT- 


10 


0.94 


GAT- 


20 


1.89 


GGT- 


148 


13.98 


Val 






Ala 






Asp 






Gly 






GTC- 


5 


0.47 


GCC- 


24 


2.26 


GAC- 


14 


1.32 


GGC- 


178 


16.82 


Val 






Ala 






Asp 






Gly 






GTA- 


0 


0.00 


GCA- 


8 


0.75 


GAA- 


40 


3.78 


CGA- 


9 


0.85 


Val 






Ala 






Glu 






Giy 






GTG- 


12 


1.13 


GCG- 


80 


7.56 


GAG- 


9 


0.85 


GGG- 


12 


1.13 


Val 






Ala 






Glu 






Gly 







[0196] Oligos of approximately 80 nucleotides were synthesized on a Beckman Oligo 1 000 DN A synthesizer, cleaved 
and deprotected with aqueous NH 4 OH, and purified by electrophoresis in 7M urea/12% polyacrylamide gels. Each set 
of oligos was designed to have an EcoR I restriction enzyme site at the 5' end, a unique restriction site near the 3' end, 
followed by the TAAT stop sequence and a Hind III restriction enzyme site at the very 3' end. The first four oligos, 
comprising the first 81 amino acids of the human collagen Type I (a^ gene, are given in Figure 40 which shows the 
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sequence and restriction maps of synthetic oligos used to construct the first 243 base pairs of the human Type I (a,) 
collagen gene with optimized E. coli codon usage. Oligos N1-1 (SEQ. ID. NO. 21) and N1-2 (SEQ. ID. NO. 22) were 
designed to insert an initiating methionine (ATG) codon at the 5' end of the gene. 

[0197] In one instance, oligos N1-1 and N1-2 (1u.g each) were annealed in 20 ul of T7 DNA polymerase buffer 

5 (40mM Tris-HC1 (pH 8.0), 5mM MgCI 2 , 5mM dithiothreitol, 50mM NaCI, 0.05 mg/mL bovine serum albumin) by heating 
at 90°C for 5 minutes followed by slow cooling to room temperature. After brief centrifugation at 14,000 rpm, 10 units 
of 17 DNA polymerase and 2 ul of a solution of all four dNTPs (dATP, dGTP, dCTP, dTTP, 2.5mM each) were added 
to the annealed oligos. Extension reactions were incubated at 37°C for 30 minutes and then heated at 70°C for 10 
minutes. After cooling to room temperature, Hind III buffer (5 pL of 10x concentration), 20 jxL of H 2 0, and 10 units of 

10 Hind III restriction enzyme were added and the tubes incubated at 37°C for 10 hours. Hind III buffer (2u.L of 10x con- 
centration), 13.5^L of 0.5M Tris HCI (pH 7.5), 1 .8 jj.L of 1% Triton X100, 5.6 uL of H 2 0, and 20 U of EcoR I were added 
to each tube and incubation continued for 2 hours at 37°C. Digests were extracted once with an equal volume of phenol, 
once with phenol/chloroform/isoamyl alcohol, and once with chloroform/isoamyl alcohol. After ethanol precipitation, 
the pellet was resuspended in 10ulofTE buffer (10mM Tris-HC1 (pH 8.0), 1mM EDTA). Resuspended pellet (4 

: 15 was ligated overnight at 16°C with agarose gel-purified EcoRI/Hind III digested pBSKS + vector (1 jj.g) using T4 DNA 
ligase (100 units). One half of the transformation mixture was transformed by heat shock into DH5a cells and 100 \iL 
of the 1.0 mL transformation mixture was plated on Luria Broth (LB) agar plates containing 70 (ig/mL ampicillin. Plates 
were incubated overnight at 37°C. Ampicillin resistant colonies (6-12) were picked and grown overnight in LB media 
containing 70 mg/mL ampicillin. Plasmid DNA was isolated from each culture by Wizard Minipreps (Promega Corpo- 

2d ration, Madison W!) and screened for the presence of the approximately 120 base pair insert by digestion with EcoR 
I and Hind III and running the digestion products on agarose electrophoresis gels. Clones with inserts were confirmed 
by standard dideoxy termination DNA sequencing. The correct clone was named pBSN1 -1 (Figure 41 ) and the collagen 
fragment has the nucleic acid sequence given in Figure 42 (SEQ. ID. NO. 25). 

[0198] Oligos N1-3 (SEQ. ID. NO. 23) and N1 -4 (SEQ. ID. NO. 24) (Figure 40) were synthesized, purified, annealed, 
25 extended, and cloned into pBSKS* following the same procedure given above for oligos N1-1 and N1-2. The resulting 
plasmid was named pBSN1-2A. To clone together the sections of the collagen gene from pBSN1-1 and pBSN1-2A, 
plasmid pBSN1-1 (1 \lq) was digested for 2 hours at 37°C with Rsr II and Hind III. The digested vector was purified by 
agarose gel electrophoresis. Plasmid pBSN1-2A (3 \lq) was digested for 2 hours at 37°C with Rsr II and Hind III and 
the insert purified by agarose gel electrophoresis. Rsr ll/Hind Ill-digested pBSN1-1 was ligated with this insert overnight 
30 at 16°C with T4 DNA ligase. One half of the ligation mixture was transformed into DH5a cells and 1/10 of the trans- 
formation mixture was plated on LB agar plates containing 70 u.g/mL ampicillin. After overnight incubation at 37°C, 
ampicillin-resistant clones were picked and screened for the presence of insert DNA as described above. Clones were 
confirmed by dideoxy termination sequencing. The correct clone was named pBSN1-2 (Figure 43) and the collagen 
fragment has the sequence given in Figure 44. 
35 [0199] In similar manner, the remainder of the collagen gene is constructed such that the final DNA sequence is that 
given in Figure 39A-39E (SEQ. ID. NO. 19). 

B) Expression of the gene in E. coli: 

40 [0200] Following construction of the entire human collagen Type I (o^ ) gene with codon usage optimized for E. coli, 
the cloned gene is expressed in E. coli. A plasmid (pHuCol £c , Figure 45) encoding the entire synthetic'collagen gene 
(Figure 39A-39E) placed behind the isopropyl-p-D-thiogalactopyranoside (IPTG)-inducible tac promotor and also en- 
coding p-lactamase is transformed into Escherichia coli strain DH5cc (supE44 A/acU169 (<|>80/3cZ AM1 5) hsdRI 7 recA1 
endA1 gyrA96 thi-\ re/A1) by standard heat shock transformation. Transformation cultures are plated on Luria Broth 

45 (LB) containing 1 00 |ig/mL ampicillin and after overnight growth a single ampicillin-resistant colony is used to inoculate 
10 mL of LB containing 100fig/mL ampicillin. Aftergrowth for 10-16 hours with shaking (225 rpm) at 37°C, this culture 
is used to inoculate 1 L of LB containing 100 \iglml ampicillin in a 1.5 L shaker flask. After growth at 37°C, 225 rpm, 
for 2 hours post-inoculation, the optical density at 600 nm is approximately 0.5 OD/mL. IPTG is added to 1mM and the 
culture allowed to grow for an additional 5-10 hours. Cells are harvested by centrifugation (5000 rpm, 10 minutes) and 

50 lysed by mechanical disruption. Recombinant human collagen is purified by ammonium sulfate fractionation and column 
chromatography. The yield is typically 15-25 mg/L of culture. 

EXAMPLE 11 

55 Expression in E. coli of an 81 Amino Acid Fragment of Human Collagen Type l(ort ) with Optimized E. coli Codon Usage 

[0201] A plasmid (pTrcN1-2, Figure 46) encoding the gene sequence of the first 81 amino acids of human Type I 
(c^) collagen with optimized E. coli codon usage cloned in fusion with a 6 histidine tag at the 5' end of the gene and 
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placed behind the isopropyl-p-D-thiogalactopyranoside (IPTG)-inducible trc promoter and also encoding (J-Iactarnase 
was constructed by subcloning the EcoR I/Hind III insert from pBSN1-2 into the EcoR I/Hind III site of plasmid pTrcB 
(Invitrogen, San Diego, CA). Plasmid pTrcN1-2 was transformed into Escherichia coli strain DH5a (supE44A/acU169 
(tyBOIaclZ AM15) hsdRU recA1 eno*A1 gyrA96 f/w-1 re/A1) by standard heat shock transformation. Transformation 

-5 cultures were plated on Luria Broth (LB) containing 100 ^g/mL ampicillin and after overnight growth a single ampicillin- 
resistant colony was used to inoculate 5mL of LB containing 100 u.g/mL ampicillin. After growth for 10-16 hours with 
shaking (225 rpm) at 37°C, this culture was used to inoculate 50 mL of LB containing 100 u.g/mL ampicillin in a 250 
mL shaker flask. After growth at 37°C, 225 rpm, for 2 hours post-inoculation, the optical density at 600 nm was ap- 
proximately 0.5 OD/mL. IPTG was added to 1mM and the culture allowed to grow for an additional 5-10 hours. Cells 

'jo were harvested by centrifugation (5000 rpm, 10 minutes) and stored at -20°C. The 6 histidine tag-collagen fragment 
fusion was purified on nickel resin columns. Cell pellets were resuspended in 10 mL of 6M guanidine hydrochloride/ 
20mM sodium phosphate/500mM Nad (pH 7.8) and bound in two 5 mL batches to the nickel resin. Columns were 
washed two times with 4 mL of binding buffer (8M urea/20mM sodium phosphate/500mM NaCI (pH 7.8)) ( two times 
with wash buffer 1 (8M urea/20mM sodium phosphate/SOOmM NaCI (pH 6.0)), and two times with wash buffer 2 (8m 

is urea/20mM sodium phosphate/500mM NaCI (pH 5.3). The 6 histidine tag-collagen fragment fusion was eluted from 
the column with 5mL of elution buffer (8M urea/20mM sodium phosphate/500mM NaCI (pH 4.0) in 1 mL fractions. 
Fractions were assessed for protein by gel electrophoresis and fusion-containing fractions were concentrated and 
stored at -20°C. The yield was typically 15-25 mg/L of culture. 

[0202] The collagen is cleaved from the 6 histidine tag with enterokinase. Fusion-containing fractions are dialyzed 
20 against cleavage buffer (50mM TrisHCI, pH 8.0/5mM CaCI 2 ). After addition of enterokinase at 1 \ig enzyme for each 
100 ng fusion, the solution is incubated at 37°C for 4-10 hours. Progress of the cleavage is monitored by gel electro- 
phoresis. The cleaved 6 histidine tag may be separated from the collagen fragment by passage over a nickel resin 
column as outlined above. 

25 EXAMPLE 12 

Expression in E. coli of Fragments of Human Collagen Type I (o^) with Optimized E. coli Codon Usage 

[0203] A plasmid (pN1-3, Figure 47) encoding the gene for the amino terminal 120 amino acids of human collagen 
30 Type I (a^ with optimized E. coli codon usage placed behind the isopropyl-p-D-thiogalactopyranoside (IPTG)-inducible 
fac promotor and also encoding p-lactamase is transformed into Escherichia coli strain DH5a (sup E44 A/acU169 
((|)80/acZ AM15) hsdRM recA1 endA1 gyrA96 f/)/-1 re/A1) by standard heat shock transformation. Transformation cul- 
tures are plated on Luria Broth (LB) containing 100 ng/mL ampicillin and after overnight growth a single ampicillin- 
resistant colony is used to inoculate 10 mL of LB containing 100 u,g/mL ampicillin. After growth for 10-16 hours with 
35 shaking (225 rpm) at 37°C, this culture is used to inoculate 1 L of LB containing 100 u.g/mL ampicillin in a 1 .5 L shaker 
flask. After growth at 37°C, 225 rpm, for 2 hours post-inoculation, the optical density at 600 nm is approximately 0.5 
OD/mL. IPTG is added to 1mM and the culture allowed to grow for an additional 5-10 hours. Cells are harvested by 
centrifugation (5000 rpm, 10 minutes) and lysed by mechanical disruption. Recombinant human collagen is purified 
by ammonium sulfate fractionation and column chromatography. The yield is typically 15-25 mg/L of culture. 

40 

EXAMPLE 13 

Expression in E. coli of a C-terminal Fragment of Human Collagen Type I (o^) with Optimized E. coli Codon Usage. 

45 [0204] A plasmid (pD4, Figure 48) encoding the gene for the carboxy terminal 219 amino acids of human collagen 
Type I (a^ with optimized E. coli codon usage placed behind the isopropyl-p-D-thiogalactopyranoside (IPTG)-inducible 
tac promotor and also encoding p-lactamase is transformed into Escherichia coli strain DH5a {sup E44 A/acU169 
(<|>80/acZ AM15) hsdRM recM endM gyrA96 tf?/-1 re/A1) by standard heat shock transformation. Transformation cul- 
tures are plated on Luria Broth (LB) containing 100 ng/mL ampicillin and after overnight growth a single ampiciilin- 

50 resistant colony is used to inoculate 10 mL of LB containing 100 u.g/mL ampicillin. After growth for 10-16 hours with 
shaking (225 rpm) at 37°C, this culture is used to inoculate 1 L of LB containing 1 00 u,g/mL ampicillin in a 1 .5 L shaker 
flask. After growth at 37°C, 225 rpm, for 2 hours post-inoculation, the optical density at 600 nm is approximately 0.5 
OD/mL. IPTG is added to 1mM and the culture allowed to grow for an additional 5-10 hours. Cells are harvested by 
centrifugation (5000 rmp, 10 minutes) and lysed by mechanical disruption. Recombinant human collagen fragment is 

55 purified by ammonium sulfate fractionation and column chromatography. The yield is typically 15-25 mg/L of culture. 
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EXAMPLE 14 

Construction and Expression in E. coli of the Human Collagen Type 1 (ct2) Gene with Optimized E. coli Codon Usage 

5 A) Construction of the gene: 

[0205] The nucleotide sequence of the helical region of human collagen Type I (ot 2 ) gene flanked by 11 amino acids 
of the amino terminal extra-helical and 12 amino acids of the C-terminal extra-helical region is shown in Figures 49A- 
49E (SEQ. ID. NO. 29). A tabulation of the codon frequency of this gene is given in Table III below. The gene sequence 
10 shown in Figures 49A-49E was first changed to reflect E. coli codon bias. An initiating methionine was inserted at the 
5' end of the gene and a TAAT stop sequence at the 3' end. Unique restriction sites are identified or created approxi- 
mately every 150 base pairs. The resulting gene (HuCol(a 2 ) Ec , Figures 50A-50E) (SEQ. ID. NO. 31) has the codon 
usage given in Table IV below. Other sequences that approximate E. coli codon bias are also acceptable. 

Table in 
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r. 
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0) 
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0 
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flj 


o 
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o 
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u 




u 


a 




TTT-Phe 


3 


0. 


26 


TCT-Ser 


11 


1.06 


TAT-Tyr 


2 


0.19 


TGT-Cys 


0 


0.00 


TTC-Phe 


10 


0. 


96 


TCC-Ser 


4 


0.38 


TAC-Tyr 


3 


0.28 


TGC-Cys 


0 


0.00 


TTA-Leu 


1 


0. 


09 


TCA-Ser 


1 


0.09 


TAA-+** 


0 


0.00 


TGA-+** 


0 


0.00 


TTO-Leu 


2 


0. 


19 


TCG-Ser 


1 


0.09 


TAG-*** 


0 


0.00 


TGG-Trp 


0 


0.00 


CTT-Leu 


16 


1. 


54 


CCT-Pro 


125 


12.06 


CAT -His 


7 


0.67 


CGT-Arg 


17 


1.64 


CTC-Leu 


9 


0. 


86 


CCC-Pro 


42 


4.05 


CAC-His 


6 


0.57 


CGC-Arg 


6 


0.57 


CTA-Leu 


2 


0. 


19 


CCA- Pro 


30 


2.89 


CAA-Gln 


13 


1,25 


CGA-Arg 


6 


0.57 


CTG-Leu 


5 


0. 


48 


CCG-Pro 


3 


0.28 


CAG-Gln 


9 


0.86 


CGG-Arg 


4 


0.38 


ATT- lie 


14 


1. 


35 


ACT-Thr 


14 


1.35 


AAT-Asn 


10 


0.96 


AGT-Ser 


11 


1.06 


ATC-Ile 


3 


0. 


26 


ACC-Thr 


0 


0.00 


AAC-Asn 


14 


1.35 


AGC-Ser 


4 


0.38 


ATA- He 


1 


0. 


09 


ACA-Thr 


3 


0.28 


AAA-Lys 


15 


1.44 


AOA-Axg 


16 


1-54 


ATG-Met 


5 


0. 


46 


ACG-Thr 


1 


0.09 


AAG-Lys 


16 


1.54 


AGG-Arg 


6 


0.57 


GTT-Val 


20 


1. 


93 


GCT-Ala 


82 


7.91 


GAT-Asp 


20 


1.93 


GGT-Gly 


179 


17.27 


GTC-Val 


5 


0. 


46 


GCC-Ala 


37 


1.64 


GAC-Asp 


5 


0.48 


GGC-Gly 


74 


7.14 


GTA-Val 


3 


0. 


26 


OCA- Ala 


9 


0.86 


GAA-Glu 


29 


2.79 


GGA-Gly 


80 


7.72 


GTG-Val 


10 


0. 


96 


GCG-Ala 


0 


0.00 


GAG-Glu 


16 


1.54 


GGG-Gly 


16 


1.54 
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Table IV 
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CTT-Leu 


1 


0. 


09 


CCT-Pro 


10 


0 . 96 


CAT- Hi 8 


2 


0 . 19 


CGT-Arg 


37 


3 « 55 


CTC-Leu 


1 


0. 


09 


CCC-Pro 


0 


0 . 00 


CAC-Hia 


11 


1 . 05 


CGC-Arg 


18 


1. 72 




A 

K/ 


0. 


00 


CCA- Pro 


15 


1 . 44 


CAA-Gln 


7 


0. 67 


CGA-Arg 


o 


0. 00 


CTG-Leu 


32 


3. 


07 


CCG-Pro 


177 


17.00 


CAG-Gla 


15 


1.44 


CGG-Arg 


0 


0.00 


ATT- He 


11 


1. 


05 


ACT-Thr 


3 


0.28 


AAT-Asn 


6 


0.57 


AGT-Ser 


0 


0.00 


ATC-Ile 


7 


0. 


67 


ACC-Thr 


6 


0.57 


AAC-Asn 


18 


1.72 


AGC-Ser 


13 


1.24 


ATA- He 


0 


0. 


00 


ACA-Thr 


0 


o.oo 


AAA-L/s 


25 


2.40 


AGA-Arg 


0 


0.00 


ATG-Met 


6 


0. 


57 


ACG-Thx 


10 


0.96 


AAG-Lys 


6 


0.57 


AGG-Arg 


0 


o.oo 


GTT-Val 


18 


1. 


72 


GCT-Ala 


30 


2.88 


GAT- Asp 


11 


1.05 


GGT-Gly 


209 


20.07 


GTC-Val 


7 


0. 


67 


GCC-Ala 


21 


2.01 


GAC-Asp 


13 


1.24 


GGC-Gly 


141 


13.54 


GTA-Val 


9 


0. 


85 


GCA-Ala 


20 


1.92 


GAA-Glu 


33 


3.17 


GGA-Gly 


0 


0.00 


GTG-Val 


6 


0. 


57 


GCG-Ala 


38 


3.65 


GAG-Glu 


12 


1.15 


GGG-Gly 


0 


0.00 



[0206] Oligos of approximately 80 nucleotides are synthesized on a Beckman Oiigo 1 000 DNA synthesizer, cleaved 
25 and deprotected with aqueous NH 4 OH, and purified by electrophoresis in 7M urea/12% polyacrylamide gels. Each set 
of oligos is designed to have an EcoR I restriction enzyme site at the 5' end, a unique restriction site near the 3' end; 
followed by the TAAT stop sequence and a Hind III restriction enzyme site at the very 3* end. Oligos N1 -1(a 2 ) and N1-2 
(a 2 ) are designed to insert an initiating methionine (ATG) codon at the 5' end of the gene. 

[0207] In one instance, oligos N1-1(oc 2 ) and N1-2(a 2 ) (1 u£ each) (Figure 51 depicts sequence and restriction maps 

30 of synthetic oligos used to construct the first 240 base pairs of human Type l(oc 2 ) collagen gene with optimized E. coli 
codon usage) are annealed in 20 ul of T7 DNA polymerase buffer (40mM Tris HCI (pH 8.0), 5mM MgCI 2 , 5mM dithi- 
othreitol, 50mM NaCI, 0.05 mg/mL bovine serum albumin) by heating at 90°C for 5 minutes followed by slow cooling 
to room temperature. After brief centrifugation at 14,000 rpm, 10 units of T7 DNA polymerase and 2 (xL of a solution 
of all four dNTPs (dATP, dGTP, dCTP, dTTP, 2.5mM each) are added to the annealed oligos. Extension reactions are 

35 incubated at 37°C for 30 minutes and then heated at 70°C for 10 minutes. After cooling to room temperature, Hind III 
buffer (5 uX of 10x concentration), 20 ul of H 2 0, and 10 units of Hind III restriction enzyme are added and the tubes 
incubated at 37°C for 10-16 hours. Hind III buffer (2 u.L of 10x concentration), 13.5 \iL of 0.5 Tris HCI (pH 7.5), 1.8 uX 
of 1% Triton X100, 5.6 uX of H 2 0, and 20 U of EcoR I are added to each tube and incubation continued for 2 hours at 
37°C. Digests are extracted once with an equal volume of phenol, once with phenol/chioroform/isoamyl alcohol, and 

40 once with chloroform/isoamyl alcohol. After ethanol precipitation, the pellet is resuspended in 10|xL of TE buffer (10mM 
Tris HCI (pH 8.0), 1 mM EDTA). Resuspended pellet (4 ul) is ligated overnight at 1 6°C with agarose gel-purified EcoRI/ 
Hind III digested pBSKS* vector (1 ng) using T4 DNA ligase (100 units). One half of the transformation mixture is 
transformed by heat shock into DH5a cells and 100 u,L of the 1.0 ml_ transformation mixture is plated on Luria Broth 
(LB) agar plates containing 70 u.g/mL ampicillin. Plates are incubated overnight at 37°C. Ampicillin resistant colonies 

45 (6-12) are picked and grown overnight in LB media containing 70^g/mL ampicillin. Plasmid DNA is isolated from each 
culture by Wizard Minipreps (Promega Corporation, Madison, Wl) and screened for the presence of the approximately 
120 base pair insert by digestion with EcoR I and Hind III and running the digestion products on agarose electrophoresis 
gels. Clones with inserts are confirmed by standard dideoxy termination DNA sequencing. The correct clone is named 
pBSN1-1(a 2 ) Figure 52). 

so [0208] Oligos N1-3(a 2 ) and N1-4(a 2 ) are synthesized, purified, annealed, extended, and cloned into pBSKS + follow- 
ing the same procedure given above for oligos N1-1(ot 2 ) and N1-2(a 2 ). The resulting plasmid is named pBSN1-2A. To 
clone together the sections of the collagen gene from pBSN1-1(a 2 ) (1 u.g) is digested for 2 hours at 37°C with BsrF I 
and Hind III. The digested vector is purified by agarose gel electrophoresis. Plasmid pBSn1-2(a 2 ) (3 ng) is digested 
for 2 hours at 37°C with BsrF I and Hind III and the insert purified by agarose gel electrophoresis. BsrF I/Hind Mi- 
ss digested pBSN1-1 is ligated with this insert overnight at 16°C with T4 DNA ligase. One half of the ligation mixture is 
transformed into DH5cc cells and 1/10 of the transformation mixture is plated on LB agar plates containing 70 ng/mL 
ampicillin. After overnight incubation at 37°C P ampicillin-resistant clones are picked and screened for the presence of 
insert DNA as described above. Clones are confirmed by dideoxy termination sequencing. The correct clone is name 
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pBSN1-2(cc 2 ) (Figure 53) and the collagen fragment has the sequence given in Figure 54 (SEQ. ID. NO. 37). 
[0209] In a similar manner, the remainder of the collagen gene is constructed such that the final DNA sequence is 
that given in Figures 50A-50E (SEQ. ID. NO. 31 ). 

5 B) Expression of the gene in E. coli: 

[0210] Following construction of the entire human collagen Type I (a2) gene with codon usage optimized for E. coli, 
the cloned gene is expressed in E. coli A plasmtd (pHucol(a 2 ) Ec , Figure 55) encoding the entire synthetic collagen 
gene (Figures 50A-50E) placed behind the isopropyl-p-D-thiogatactopyranoside (IPTG)-inducible tac promotor and 

10 also encoding p-lactamase is transformed into Escherichia coli strain DH5ct (supE44 A/acU1 69 (ty&QlacZ AM1 5) hsdHM 
recA1 endA1 gyrA96 tf?/-1 re/A1) by standard heat shock transformation. Transformation cultures are plated on Luria 
Broth (LB) containing 100 u^/mL ampicillin and after overnight growth a single ampicillin-resistant colony is used to 
inoculate 10 mL of LB containing 100 u.g/mL ampicillin and after overnight growth a single ampicillin-resistant colony 
is used to inoculate 10 mL of LB containing 100 ^ig/mL ampicillin. After growth for 10-16 hours with shaking (225 rpm) 

15 at 37°C, this culture is used to inoculate 1 L of LB containing 1 00 ug/mL ampicillin in a 1 .5 L shaker flask. After growth 
at 37°C, 225 rpm, for 2 hours post-inoculation, the optical density at 600 nm is approximately 0.5 OD/mL. IPTG is 
added to 1mM and the culture allowed to grow for an additional 5-10 hours. Cells are harvested by centrifugation (5000 
rpm, 10 minutes) and lysed by mechanical disruption. Recombinant human collagen is purified by ammonium sulfate 
fractionation and column chromatography. The yield is typically 15-25 mg/L of culture. 

20 

EXAMPLE 14A 

Alternative Construction and Expression in E. Coli of the Human Collagen Type 1 (a2) Gene with Optimized E. coli 
Codon Usage 

25 

A) Construction of the gene: 

[0211] The nucleotide sequence of the helical region of human collagen Type 1 (ot2) gene flanked by 11 amino acids 
of the amino terminal extra-helical and 12 amino acids of the C-terminal extra-helical region is shown in Figures 49A- 

30 49E (SEQ. ID. NO. 29). A tabulation of the codon frequency of this gene is given in Table III. The gene sequence shown 
in Figures 49A-49E was first changed to reflect E. coli codon bias. An initiating methionine was inserted at the 5' end 
of the gene and a TAAT stop sequence at the 3* end. Unique restriction sites were identified or created at appropriate 
locations in the gene (approximately every 1 50 base pairs). The resulting gene (HuCol(a2) Ec , Figures 50A-50E) (SEQ. 
ID. NO. 31) has the codon usage given in Table IV. Other sequences that approximate E. coli codon bias are also 

35 acceptable. 

[0212] Oligonucleotides were synthesized on a Beckman Oligo 1000 DNA synthesizer, cleaved and deprotected with 
aqueous NH 4 OH, and purified by electrophoresis in 7M urea/12% polyacrylamide gels. Purified oligos (32.5 pmol) were 
dissolved in 20jiL of ligation buffer (Boehringer Mannheim, Cat. No. 1635 379) and annealed by heating to 95°C followed 
by slow cooling to 20°C over 45 minutes. The annealed oligonucleotides were ligated for 5 minutes at room temperature 

40 with digested vector (1fig) using T4 DNA ligase (5 units). One half of the transformation mixture was transformed by 
heat shock into DH5oc cells and 100uL of the 1.0mL transformation mixture plated on Luria Broth (LB) agar plates 
containing 70|xg/mL ampicillin. Plates were incubated overnight at 37°C. Ampicillin resistant colonies (6-1 2) were picked 
and grown overnight in LB media containing 70^g/mL ampicillin. Plasmid DNA was isolated from each culture by 
QIAprep Miniprep (Qiagen, Valencia, CA) and screened for the presence of insert by digestion with flanking restriction 

45 enzymes and running the digestion products on agarose electrophoresis gels. Clones with inserts were confirmed by 
standard dideoxy termination DNA sequencing. To clone together the sections of the collagen gene, and insert covering 
a flanking portion of the gene was ligated into vector containing the neighboring gene portion. Inserts were isolated 
from plasmids and vectors were cut by double digestion for 2 hours at 37°C with the appropriate restriction enzymes. 
The digested vector and insert were purified by agarose gel electrophoresis. Insert and vector were ligated for 5 minutes 

50 at room temperature following the procedure in the Rapid DNA Ligation Kit (Boehringer Mannheim). One half of the 
ligation mixture is transformed into DH5ot cells and 1/10 of the transformation mixture was plated on LB agar plates 
containing 70u<g/mL ampicillin. After overnight incubation at 37°C, ampicillin-resistant clones were picked and screened 
for the presence of insert DNA as described above. Clones were confirmed by dideoxy termination sequencing. 
[0213] In a similar manner, the remainder of the collagen gene was constructed such that the final DNA sequence 

55 is that given in Figures 50A-50E (SEQ. ID. NO. 31). 
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B) Expression of the gene in E. coli: 

[0214] Following construction of the entire human collagen Type 1(a2) gene with codon usage optimized for E. coli, 
the cloned gene is expressed in E. coli. A plasmid (pHuCol)(a2) £c , Figure 55) encoding the entire collagen gene (Fig- 

5 ures 50A-50E) placed behind the isopropyl-p-D-thiogalactopyranoside (IPTG)-inducible tac promoter and also encod- 
ing p-lactamase is transformed into Escherichia coli strain DH5a (supE44 A/acU169 (<t>807acZ AM15) hsdRM recA1 
eno*A1 gy/A96 f/?/-1 re/A1) by standard heat shock transformation. Transformation cultures are plated on Luria Broth 
(LB) containing 1 00|ig/mL ampiciilin and after overnight growth a single ampicillin-reststant colony is used to inoculate 
1 0 mL of LB containing 1 OOug/mL ampiciilin. After growth for 1 0-1 6 hours with shaking (225 rpm) at 37°C, this culture 

10 is used to inoculate 1 L of LB containing 100>g/mL ampiciilin in a 1.5 L shaker flask. After growth at 37°C, 225 rpm, 
for 2 hours post-inoculation, the optical density at 600 nm is approximately 0.5 OD/mL. IPTG is added to 1mM and the 
culture allowed to grow for an additional 5-10 hours. Cells are harvested by centrifugation (5000 rpm, 10 minutes) and 
lysed by mechanical disruption. Recombinant human collagen is purified by ammonium sulfate fractionation and column 
chromatograph. The yield is typically 15-25 mg/L of culture. 

15 

EXAMPLE 15 

Expression in E. coli of Fragments of Human Collagen Type l(cc 2 ) with Optimized E. coli Codon Usage 

20 [0215] A plasmid (pN1-2, Figure 56) encoding the gene for the amino terminal 80 amino acids of human collagen 
Type l(cc 2 ) (SEQ. ID. NO. 31, Fig. 54) with optimized E. coli codon usage placed behind the isopropyl-p-D-thiogalact- 
opyranoside (IPTG)-inducible tac promotor and also encoding (P-lactamase is transformed into Escherichia coli strain 
DH5a (supE44 A/acU169 {tyQOIacZ AM15) hsdRM recA1 endAI gyrA96 tf)M re/A1) by standard heat shock transfor- 
mation. Transformation cultures are plated on Luria Broth (LB) containing 100 jig/mL ampiciilin and after overnight 

25 growth a single ampicillin-resistant colony is used to inoculate 10 mL of LB containing 100 u.g/mL ampiciilin. After 
growth for 10-16 hours with shaking (225 rpm) at 37°C, this culture is used to inoculate 1 L of LB containing 100 u.g/ 
mL ampiciilin in a 1 .5 L shaker flask. After growth at 37°C, 225 rpm, for 2 hours post-inoculation, the optical density at 
600 nm is approximately 0.5 OD/mL. IPTG is added to 1mM and the culture allowed to grow for an additional 5-10 
hours. Cells are harvested by centrifugation (5000 rpm, 10 minutes) and lysed by mechanical disruption. Recombinant 

30 human collagen is purified by ammonium sulfate fractionation and column chromatography. The yield is typically 1 5-25 
mg/L of culture. 

EXAMPLE 16 

35 Hydroxyproline Incorporation Into Proteins In E. coli Under Proline Starvation Conditions 

[0216] Seven plasmids, pGEX-4T.1 (Fig. 73), pTrc-TGF (Fig. 74), pMaI-C2 (Fig. 1), pTrc-FN (Fig. 75), pTrc-FN-TGF 
(Fig. 76), pTrc-FN-Bmp (Fig. 77) and pGEX-HuColl Ec , each separately containing genes encoding the following pro- 
teins: glutathione S-transferase (GST), the mature human TGF-p1 polypeptide (TGF-pi), mannose-binding protein 

40 (MBP), a 70 kDA fragment of human fibronectin (FN), a fusion of FN and TGF-pi (FN-TGF-p1), a fusion of FN and 
human bone morphogenic protein 2A (FN-BMP-2A), and a fusion of GST and collagen (GST-Coll), were used individ- 
ually to transform proline auxotrophic E. coli strain JM109 (F-). Transformation cultures were plated on LB agar con- 
taining 100 ng/ml ampiciilin. After overnight incubation at 37°C, a single colony from a fresh transformation plate was 
used to inoculate 5 ml of LB media containing 400 mg ampiciilin. After overnight growth at 37°C, this culture was 

45 centrifuged, the supernatant discarded, and the cell pellet washed twice with 5 ml of M9 medium (1X M9 salts, 0.5% 
glucose, 1 mM MgCI 2 , 0.01 % thiamine, 200 p.g/ml glycine, 200 jig/ml alanine, 1 00 u.g/ml of the other amino acids except 
proline, and 400 |ig/ml ampiciilin). The cells were finally resuspended in 5 ml of M9 medium. After incubation with 
shaking at 37°C for 30 minutes, /rans-4-hydroxyproline was added to 40mM, NaCI to 0.5 M, and isopropyt-B-D-thioga- 
lactopyyranoside to 1 .5 mM. In certain cultures one of these additions was not made, as indicated in the labels for the 

so lanes of the gels. After addition, incubation with shaking at 37°C was continued. After 4 hours, the cultures were 
centrifuged, the supernatants discarded, and the cell pellets resuspended in SDS-PAGE sample buffer (300 mM Tris 
(pH6.8)/0.5% SDS/10% glycerol/0.4M p-mercapthoethanol/0.2% bromophenol blue) to 15 OD600nm AU/ml, placed 
in boiling water bath for five minutes, and electrophoresed in denaturing polyacrylaminde gels. Proteins in the gels 
were visualized by staining with Coomassie Blue R250. The results of the gels are depicted in scans shown in Figs. 

55 57-59. The scans relating to GST, TGF-pi , MBP, FN, FN-TGF-p1 , and FN-BMP-2A (Figs. 57 and 58) show three lanes 
relating to each peptide, i.e., one lane indicating +NaCI/+Hyp wherein NaCI (hyperosmotic) and frans-4-hydroxyproline 
are present; one lane indicating -NaCI wherein frar7S-4-hydroxyproline is present but NaCI is not; and one lane indicating 
-Hyp which is +NaCI but absent frans-4-hydroxyproline. Asterisks on the scans mark protein bands which correspond 
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to the expressed target protein. The instances in which target protein was expressed all involve +NaCI in connection 
with +Hyp thus demonstrating +NaCI and +Hyp dependence. 

[0217] The scan shown in Fig. 59 relating to GST-collagen shows four lanes relating to GST-Coll, i.e., one lane 
indicating +Hyp/+NaCI/-IPTG wherein frans-4-hydroxyproline and NaCI are present but IPTG (the protein expression 

5 inducer) is not and since there is no inducer, there is no target protein band; one lane indicating +NaCI/+IPTG/-Hyp 
wherein NaCI and IPTG are present but frans-4-hydroxyproline is not and, since /rans-4-hydroxyproline is not present 
no target protein band is evident; one lane indicating +NaC!/+Pro/+IPTG wherein NaCI, proline and IPTG are present, 
but since the target protein is not stable when it contains proline, there is no target protein band; and one lane designated 
+IPTG/+NaCI/+Hyp wherein IPTG, NaCI and frans-4-hydroxyproline are present and since the protein is stabilized by 

10 the presence of frans-4-hydroxyproline an asterisk marked protein band is evident. 

EXAMPLE 17 

Hydroxyproline incorporation into a collagen-like peptide in E. coli. 

15 

[0218] A plasmid (pGST-CM4, Figure 60) containing the gene for collagen mimetic 4 (CM4, Figure 61) (SEQ. ID. 
NO. 39) genetically linked to the 3' end of the gene for S. japonicum glutathione S-transferase was used to transform 
by electroporation proline auxotrophic E. coli strain JM109 (F-). Transformation cultures were plated on LB agar con- 
taining 100 u,g/ml ampicillin. After overnight incubation at 37° C, a single colony from a fresh transformation plate was 

20 used to inoculate 5 ml of LB media containing 100 fig/ml ampicillin. After overnight growth at 37° C, 500 uJ of this 
culture was centrifuged, the supernatent discarded, and the cell pellet washed once with 500 ^il of M9 medium (1X M9 
salts, 0.5 % glucose, 1 mM MgCI 2 , 0.01 % thiamine, 200 ng/ml glycine, 200ng/ml alanine, 1 00 ng/ml of the other amino 
acids except proline, and 400 u.g/ml ampicillin). The cells were finally suspended in 5 ml of M9 medium containing 10 
|xg/ml proline and 2 ml of this was used to inoculate 30 ml of M9 medium containing 1 0 ng/ml proline. After incubation 

25 with shaking at 37° C for 8 hours, the culture was centrifuged and the cell pellet washed once with M9 medium containing 
5 ng/ml proline. The pellet was resuspended in 15 ml of M9 medium containing 5 |ig/ml of proline and this culture was 
used to inoculate 1 L of M9 medium containing 5 |ig/ml of proline. This culture was grown for 18 hours at 37° C to 
proline starvation. At this time, the culture was centrifuged, the cells washed once with M9 medium (with no proline), 
and the cells resuspended in 1 L of M9 medium containing 80 mM hydroxyproline, 0.5 M NaCI, and 1.5 mM isopropyl- 

30 p-D-thiogalactopyranoside. Incubation was continued at 37° C with shaking for 22 hours. The cultures were centrifuged 
and the cell pellets stored at -20°C until processed further. 

EXAMPLE 18 

35 Proline incorporation into a collagen-like peptide in E. coli. 

[0219] A plasmid (pGST-CM4, Figure 60) containing the gene for collagen mimetic 4 (CM4, Figure 61) (SEQ. ID. 
NO. 39) genetically linked to the 3' end of the gene for S. japonicum glutathione S-transferase was used to transform 
by electroporation proline auxotrophic E. coli strain JM109 (F-). Transformation cultures were plated on LB agar con- 

^0 taining 100 u.g/ml ampicillin. After overnight incubation at 37° C, a single colony from a fresh transformation plate was 
used to inoculate 5 ml of LB media containing 100 u.g/ml ampicillin. After overnight growth at 37° C, 500 u.l of this 
culture was centrifuged, the supernatent discarded, and the cell pellet washed once with 500 jal of M9 medium (1X M9 
salts, 0.5 % glucose, 1 mM MgCI 2 , 0.01 % thiamine, 200u.g/ml glycine, 200 jig/ml alanine, 100ng/ml of the other amino 
acids except proline, and 400 u.g/mL ampicillin). The cells were finally resuspended in 5 ml of M9 medium containing 

^5 10 ng/ml proline and 2 ml of this was used to inoculate 30 ml of M9 medium containing 10 jig/ml proline. This culture 
was incubated with shaking at 37° C for 8 hours. The culture was centrifuged and the cell pellet washed once with M9 
medium containing 5 ng/ml proline. The pellet was resuspended in 15 ml of M9 medium containing 5 u.g/ml of proline 
and this culture was used to inoculate 1 L of M9 medium containing 5 (ig/ml of proline. This culture was grown for 18 
hours at 37°C to proline starvation. At this time, the culture was centrifuged, the cells washed once with M9 medium 

50- (with no proline), and finally the cells were resuspended in 1 L of M9 medium containing 2.5 mM proline, 0.5 M NaCI, 
and 1.5 mM isopropyl-p-p-thiogalactopyranoside. Incubation was continued at 37° C with shaking for 22 hours. The 
cultures were then centrifuged and the cell pellets stored at -20°C until processed further. 

EXAMPLE 19 

55 

Purification of hydroxyproline-containing collagen-like peptide from E. coli 

[0220] The cell pellet from a 1 L fermentation culture prepared as described in Example 1 7 above, was resuspended 
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in 20 ml of Dulbecco's phosphate buffered saline (pH 7.1) (PBS) containing 1 mM EDTA, 100 ^iM PMSF, 0.5 \ig/m\ 
E64, and 0.7 ng/ml pepstatin (resuspension buffer). The cells were lysed by twice passing through a French press. 
Following lysis, the suspension was centrifuged for 30 minutes at 30,000 xg. The supernatent was discarded and the 
pellet washed once with 5 ml of resuspension buffer containing 1 M urea and 0.5% Triton X100 followed by one wash 

5 with 7 ml of resuspension buffer without urea or Triton X1 00. The pellet was finally resuspended in 5 ml of 6M guanidine 
hydrochloride in Dulbecco's phosphate buffered saline (pH7.1) containing 1 mM EDTA and 2 mM (J-mercaptoethanol 
and sonicated on ice for 3 x 60 seconds (microtip, power = 3.5, Heat Systems XL-2020 model sonicator). The sonicated 

' suspension was incubated at 4° C for 1 8 hours and then centrifuged at 14,000 rpm in a microcentrifuge. The supernatent 
(6 ml) was dialyzed (10,000 MWCO) against 4 x 4 L of distilled water at 4°C. The contents of the dialysis tubing were 

10 transferred to a 150 ml round bottom flask and lyophilized to dryness. The residue (-30 mg) was dissolved in 3 ml of 
70% formic acid and 40 mg of cyanogen bromide was added. The flask was flushed once with nitrogen, evacuated, 
and allowed to stir for 1 8 hours at room temperature. The contents of the flask were taken to dryness in vacuo at room 
temperature, the residue resuspended in 5 ml of distilled water and evaporated to dryness again. This was repeated 
2 times. The residue was finally dissolved in 2 ml of 0.2% trifluoroacetic acid (TFA). The trifluoroacetic acid-soluble 

15 material was applied in 100 til aliquots to a Poros R2 column (4.6 mm x 100 mm) running at 5 ml/min. with a starting 
buffer of 98% 0.1% trifluoroacetic acid in water/2% 0.1 % TFA in acetonitrile. The hydroxyproline-containing protein 
was eluted with of gradient of 2% 0.1% TFA/aceto nitrite to 40% 0.1% TFA/acetonitrile over 25 column volumes (Fig. 
62A). The collagen-mimetic eluted between 18 and 23% 0.1% TFA/acetonitrile. Figure 62A is a chromatogram of the 
elution of hydroxyproline containing CM4 from a Poros RP2 column (available from Perseptive Biosystems, Framing- 

20 ham, MA). The arrow indicates the peak containing hydroxyproline containing CM4. Fractions were assayed by SDS- 
PAGE and collagen mimetic-containing fractions were pooled and lyophilized. Lyophilized material was stored at -20° 
C. 

EXAMPLE 20 

25 

Purification of proline-containing collagen-like peptide from E. coli 

[0221] The cell pellet from a 500 ml fermentation culture prepared as described in Example 18 above, was resus- 
pended in 20 ml of Dulbecco's phosphate buffered saline (pH 7.1) (PBS) containing 10 mM EDTA, 100 uM PMSF, 0.5 

30 \igfm\ E64, and 0.06 u.g/ml aprotinin . Lysozyme (2 mg) was added and the suspension incubated at 4° C for 60 minutes. 
The suspension was sonicated for 5 x 60 seconds (microtip, power = 3.5, Heat Systems XL-2020 model sonicator). 
The sonicated suspension was centrifuged at 20,000 xg for 15 minutes. The supernatent was adjusted to 1% Triton 
X1 00 and incubated for 30 minutes at room temperature with 7 ml of glutathione sepharose 4B pre-equilibrated in PBS. 
The suspension was centrifuged at 500 rpm for 3 minutes. The supernatent decanted, and the resin washed 3 times 

35 with 8 ml of PBS. Bound proteins were eluted with 3 aliquots (2 ml each, 1 0 minutes gentle rocking at room temperature) 
of 10 mM glutathione in 50 mM Tris (pH 8.0). Eluants were combined and dialyzed (10,000 MWCO) against 3 x 4 L of 
distilled water at 4° C. The contents of the dialysis tubing were transferred to a 1 50 ml round bottom flask and lyophilized 
to dryness. The residue was dissolved in 3 ml of 70% formic acid and 4 mg of cyanogen bromide was added. The flask 
was flushed once with nitrogen evacuated, and allowed to stir for 18 hours at room temperature. The contents of the 

to flask were taken to dryness in vacuo at room temperature, the residue resuspended in 5 ml of distilled water, and 
evaporated to dryness again. This was repeated 2 times. The residue was finally dissolved in 2 ml of 0.2% trifluoroacetic 
acid (TFA). The trifluoroacetic acid-soluble material was applied in 100 uJ aliquots to a Poros R2 column (4.6 mm x 
100 mm) running at 5 ml/min. with a starting buffer of 98% 0.1% trifluoroacetic acid in water/2% 0.1% TFA in acetonitrile. 
Bound protein was eluted with of gradient of 2% 0.1% TFA/acetonitrile to 40% 0.1% TFA/acetonitrile over 25 column 

45 volumes (Figure 62B). The collagen-mimetic eluted between 24 and 27% 0.1 % TFA/acetonitrile. Figure 62B is a chro- 
matogram of the elution of proline containing CM4 from a Poros RP2 column. The arrow indicates the peak containing 
proline containing CM4. Fractions were assayed by SDS-PAGE and collagen mimetic-containing fractions were pooled 
and lyophilized. Lyophilized material was stored at -20° C. 

50 EXAMPLE 21 

Amino acid analysis of hydroxyproline-containing collagen mimetic and proline-containing collagen mimetic. 

[0222] Approximately 30 u,g of purified hydroxyproline-containing collagen mimetic and proline-containing collagen 
55 mimetic prepared as described in Examples 19 and 20, respectively, were dissolved in 250 uJ of 6N hydrochloric acid 
in glass ampules. The ampules were flushed two times with nitrogen, sealed under vacuum, and incubated at 110°C 
for 23 hours. Following hydrolysis, samples were removed from the ampules and taken to dryness in vacuo. The 
samples were dissolved in 15 u.l of 0.1 N hydrochloric acid and subjected to amino acid analysis on a Hewlett Packard 
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AminoQuant 1090 amino acid analyzer utilizing standard OPA and FMOC derivitization chemistry. Examples of the 
results of the amino acid analysis that illustrate the region of the chromatograms where the secondary amino acids 
(proline and hydroxyproline) elute are shown in Figures 63A through 63D. These Figures also show chromatograms 
of proline and hydroxyproline amino acid standards. More particularly, Figure 63A, depicts a chromatogram of a proline 

5 amino acid standard (250 pmol). ^indicates a contaminating peak; Figure 63B depicts a chromatogram of a hydroxy- 
proline amino acid standard (250 pool). *indicates a contaminating peak. Figure 63C depicts an amino analysis chro- 
matogram of the hydrolysis of proline-containing CM4. Only the region of the chromatogram where proline and hydrox- 
yproline elute is shown, 'indicates a contaminating peak. Figure 63D depicts an amino acid analysis chromatogram 
of the hydrolysis of hydroxyproline-containing CM4. Only the region of the chromatogram where proline and hydroxy- 

10 proline elute is shown, 'indicates a contaminating peak. 

EXAMPLE 22 

Determination of proline starvation conditions forE. coli (strain JM109 (F-)) 

15 

[0223] A plasmid (pGST-CM4, Figure 60) containing the gene for collagen mimetic 4 (CM4, Figure 61 ) genetically 
linked to the 3* end of the gene for S. japonicum glutathione S-transferase was used to transform by electroporation 
proline auxotrophic E. coli strain JM109 (F-). Transformation cultures were plated on LB agar containing 100 ^g/ml 
ampicillin. After overnight incubation at 37 °C, a single colony from a fresh transformation plate was used to inoculate 

20 2 ml of M9 media (1X M9 salts, 0.5 % glucose, 1 mM MgCI 2 , 0.01 % thiamine, 200 \ig/m\ glycine, 200 u^/ml alanine, 
100 |xg/ml of the other amino acids except proline, and 200 u.g/ml carbenicillin) and containing 20 u.g/ml proline. After 
growth at 37° C with shaking for 8 hours, 1 .5 ml was used to inoculate 27 ml of M9 media containing 45 ng/ml proline. 
After incubation at 37° C with shaking for 7 hours, the culture was centrifuged, the cell pellet washed with 7 ml of M9 
media with no proline, and finally resuspended in 1 7 ml of M9 media with no proline. This culture was used to inoculate 

25 four 35 ml cultures of M9 media containing 4 u.g/ml proline at an OD600 of 0.028. Cultures were incubated with shaking 
at 37° C and the OD600 monitored. After 13.5 hours growth, the OD600 had plateaued. At this time, one culture was 
supplemented with proline at 15 u.g/ml, one with hydroxyproline at 15 u,g/ml, one with all of the amino acids at 15 \lq/ 
ml except proline and hydroxyproline, and one culture with nothing. Incubation was continued and the OD600 monitored 
for a total of 24 hours. Figure 64 is a graph of OD600 vs. time for cultures of JM109 (F-) grown to plateau and then 

30 supplemented with various amino acids. The point at which the cultures were supplemented is indicated with an arrow. 
Proline starvation is evident since only the culture supplemented with proline continued to grow past plateau. 

EXAMPLE 23 

35 Hydroxyproline Incorporation Into Type I (ct1) Collagen in E. coli 

[0224] A plasmid (pHuCol(ct1 ) £c , Figure 65) containing the gene for Type I (a1) collagen with optimized E. coli codon 
usage (Figure 39A-39E) (SEQ. ID. NO. 19) under control of the tac promoter and containing the gene for chloram- 
phenicol resistance was used to transform by electroporation proline auxotrophic E. coli strain JM109 (F-). Transfor- 

40- mation cultures were plated on LB agar containing 20 u.g/ml chloramphenicol. After overnight incubation at 37 °C, a 
single colony from a fresh transformation plate was used to inoculate 100 ml of LB media containing 20 u.g/ml chlo- 
ramphenicol. This culture was grown to an OD600nm of 0.5 and 100 jxl aliquots transferred to 1.5 ml tubes. The tubes 
were stored at -80 ° C. For expression, a tube was thawed on ice and used to inoculate 25 ml of LB media containing 
20 u^/ml chloramphenicol. After overnight growth at 37° C, a four ml aliquot was withdrawn, centrifuged, the cell pellet 

45 washed once with 1 ml of 2x YT media containing 20 u.g/ml chloramphenicol, and the washed cells used to inoculate 
1 L of 2x YT medium containing 20 ^ig/ml chloramphenicol. This culture was grown at 37° C to an OD600nm of 0.8. 
The culture was centrifuged and the cell pellet washed once with 100 ml of M9 medium (1X M9 salts, 0.5 % glucose, 
1 mM MgCI 2 , 0.01 % thiamine, 200 uxj/ml glycine, 200 |ig/m I alanine, 1 00 u,g/ml of the other amino acids except proline, 
and 20 ng/mt chloramphenicol). The cells were resuspended in 910 ml of M9 medium (1X M9 salts, 0.5 % glucose, 1 

so mM MgCI 2 , 0.01 % thiamine, 200 ng/ml glycine, 200 \ig/m\ alanine, 1 00 u.g/ml of the other amino acids except proline, 
and 20 ^g/ml chloramphenicol) and allowed to grow at 37° C for 30 minutes. NaCI (80 ml of 5 M), hydroxyproline (7.5 
ml of 2M), and IPTG (500 uJ of 1 M) were added and growth continued for 3 hours. Cells were harvested by centrifugation 
and stored at -20° C. 

55 
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EXAMPLE 24 

Hydroxyproline Incorporation Into Type I (ot2) in E. coli 

5 [0225] A plasmid (pHuCol(a2) £c , Figure 66) containing the gene for Type I (cc2) collagen with optimized E. coli codon 
usage (Figure 50A-50E) (SEQ. ID. NO. 31) under control of the tac promoter and containing the gene for chloram- 
phenicol resistance was used to transform by electroporation proline auxotrophic E. coli strain JM109 (F-). Transfor- 
mation cultures were plated on LB agar containing 20 u.g/ml chloramphenicol. After overnight incubation at 37° C, a 
single colony from a fresh transformation plate was used to inoculate 100 ml of LB media containing 20u.g/ml chlo- 

10 ramphenicol. This culture was grown to an OD600nm of 0.5 and 100 nl aliquots transferred to 1 .5 ml tubes. The tubes 
were stored at -80 0 C. For expression, a tube was thawed on ice and used to inoculate 25 ml of LB media containing 
20 u.g/ml chloramphenicol. After overnight growth at 37° C, a four ml aliquot was withdrawn, centrifuged, the cell pellet 
washed once with 1 ml of 2x YT media containing 20 jig/ml chloramphenicol, and the washed cells used to inoculate 
1 L of 2x YT medium containing 20 u.g/ml chloramphenicol. This culture was grown at 37° C to an OD600nm of 0.8. 

is The culture was centrifuged and the cell pellet washed once with 100 ml of M9 medium {1X M9 salts, 0.5 % glucose, 
1 mM MgCI 2 , 0.01 % thiamine, 200 u.g/ml glycine, 200 ^ig/ml alanine, 1 00 ug/ml of the other amino acids except proline, 
and 20 u.g/ml chloramphenicol). The cells were resuspended in 910 ml of M9 medium (1X M9 salts, 0.5 % glucose, 1 
mM MgCI 2 , 0.01 % thiamine, 200 ug/ml glycine, 200 ^ig/ml alanine, 1 00 ^g/ml of the other amino acids except proline, 
and 20 u.g/ml chloramphenicol) and allowed to grow at 37° C for 30 minutes. NaCI (80 ml of 5 M), hydroxyproline (7.5 

20 ml of 2M), and IPTG (500 u.l of 1 M) were added and growth continued for 3 hours. Cells were harvested by centrifugation 
and stored at -20° C. 

EXAMPLE 25 

25 Hydroxyproline Incorporation Into a C-terminal Fragment of Type I (a1 ) Collagen in E. coli 

[0226] A plasmid (pD4-a1, Figure 67) encoding the gene for the carboxy terminal 219 amino acids of human Type 
I (a1) collagen with optimized E. coli codon usage fused to the 3'-end of the gene for glutathione S-transferase and 
under control of the fac promoter and containing the gene for ampicillin resistance was used to transform by electro- 

30 poration proline auxotrophic E. coli strain JM109 (F-). Transformation cultures were plated on LB agar containing 100 
u.g/ml ampicillin. After overnight incubation at 37° C, a single colony from a fresh transformation plate was used to 
inoculate 100 ml of LB media containing 100 u.g/ml ampicillin. This culture was grown to an OD600nm of 0.5 and 100 
u.i aliquots transferred to 1 .5 ml tubes. The tubes were stored at -80° C. For expression, a tube was thawed on ice and 
used to inoculate 25 ml of LB media containing 400 u.g/ml ampicillin. After overnight growth at 37° C, a four ml aliquot 

35 was withdrawn, centrifuged, the cell pellet washed once with 1 ml of 2x YT media containing 400 u.g/ml ampicillin, and 
the washed cells used to inoculate 1 L of 2x YT medium containing 400 u.g/ml ampicillin. This culture was grown at 
37° C to an OD600nm of 0.8. The culture was centrifuged and the cell pellet washed once with 100 ml of M9 medium 
(1X M9 salts, 0.5 % glucose, 1 mM MgCI 2 , 0.01 % thiamine, 200 u,g/ml glycine, 200 ^g/ml alanine, 100 (ig/ml of the 
other amino acids except proline, and 400 |ig/ml ampicillin). The cells were resuspended in 910 ml of M9 medium (1X 

40 M9 salts, 0.5 % glucose, 1 mM MgCI 2 , 0.01 % thiamine, 200 jig/ml glycine, 200 p,g/ml alanine, 100 u.g/ml of the other 
amino acids except proline, and 400 ug/ml ampicillin) and allowed to grow at 37° C for 30 minutes. NaCI (80 ml of 5 
M), hydroxyproline (7.5 ml of 2M), and IPTG (500 pi of 1 M) were added and growth continued for 3 hours. Cells were 
harvested by centrifugation and stored at -20° C. 

45 EXAMPLE 26 

Hydroxyproline Incorporation Into a C-terminal Fragment of Type I (ct2) Collagen in E. coli 

[0227] A plasmid (pD4-oc2, Figure 68) encoding the gene for the carboxy terminal 219 amino acids of human Type 
50 | (a2) collagen with optimized E. coli codon usage as constructed in accordance with Example 14A fused to the 3'-end 
of the gene for glutathione S-transferase and under control of the tac promoter and containing the gene for ampicillin 
resistance was used to transform by electroporation proline auxotrophic E. coli strain JM109 (F-). Transformation cul- 
tures were plated on LB agar containing 1 00 ug/ml ampicillin. After overnight incubation at 37° C, a single colony from 
a fresh transformation plate was used to inoculate 100 ml of LB media containing 100 p.g/ml ampicillin. This culture 
55 was grown to an OD600nm of 0.5 and 100 u.l aliquots transferred to 1 .5 ml tubes. The tubes were stored at -80° C. 
For expression, a tube was thawed on ice and used to inoculate 25 ml of LB media containing 400u.g/ml ampicillin. 
After overnight growth at 37° C, a four ml aliquot was withdrawn, centrifuged, the cell pellet washed once with 1 ml of 
2x YT media containing 400 u.g/ml ampicillin, and the washed cells used to inoculate 1 L of 2x YT medium containing 
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400 \iglm\ ampicillin. This culture was grown at 37° C to an OD600nm of 0.8. The culture was centrifuged and the cell 
pellet washed once with 100 ml of M9 medium (1X M9 salts, 0.5 % glucose, 1 mM MgC^, 0.01 % thiamine, 200 u.g/ 
ml glycine, 200 u,g/ml alanine, 100 ug/ml of the other amino acids except proline, and 400 |ig/ml ampicillin). The cells 
1 were resuspended in 910 ml of M9 medium (1X M9 salts, 0.5 % glucose, 1 mM MgC^, 0.01 % thiamine, 200 \ig/m\ 
5 glycine, 200 u.g/ml alanine, 100 ug/ml of the other amino acids except proline, and 400 |ig/ml ampicillin) and allowed 
to grow at 37° C for 30 minutes. NaCI (80 ml of 5 M), hydroxyproline (7.5 ml of 2M) t and IPTG (500 pJ of 1 M) were 
added and growth continued for 3 hours. Cells were harvested by centrifugation and stored at -20° C. 

EXAMPLE 27 

10 

Purification of Hydroxyproline-containing C-terminal Fragment of Type I (oc1 ) Collagen 

[0228] Cell paste harvested from a 1 L culture grown as in Example 25 was resuspended in 30 ml of lysis buffer (2M 
urea, 137mM NaCI, 2.7mM KCI, 4.3mM Na 2 HP0 4 , 1.4mM KH 2 P0 4 , 10mM EDTA, 10mM pME, 0.1% Triton X-100, pH 

15 7.4) at 4°C. Lysozyme (chicken egg white) was added to 100 jig/ml and the solution incubated at 4 °C for 30 minutes. 
The solution was passed twice through a cell disruption press (SLM Instruments, Rochester, NY) and then centrifuged 
at 30,000 x g for 30 minutes. The pellet was resuspended in 30 ml of 50 mM Tris-HCI, pH 7.6, centrifuged at 30,000 
x g for 30 minutes, and the pellet solubilized in 25 ml of solubilization buffer (8M urea, 1 37mM NaCI, 2.7mM KCI, 4.3mM 
Na 2 HP0 4 , 1.4mM KH 2 P0 4 , 5mM EDTA, 5mM pME). The solution was centrifuged at 30,000xg for 30 minutes and 

20 supernatent dialyzed against two changes of 4 L of distilled water at 4°C. Following dialysis, the entire mixture was 
lyophilized. The lyophilized solid was dissolved in 0.1 M HCI in a flask with stirring. After addition of a 5-fold excess of 
crystalline BrCN, the flask was evacuated and filled with nitrogen. Cleavage was allowed to proceed for 24 hrs, at 
which time the solvent was removed in vacuo. The residue was dissolved in 0.1 % trifluoroacetic acid (TFA) and purified 
by reverse-phase HPLC using a Vydac C4 RP-HPLC column (10x250mm, 5u., 300 A) on a BioCad Sprint system 

25 (Perceptive Biosystems, Framingham, MA). Hydroxyproline-containing D4 protein was eluted with a gradient of 1 5-40% 
acetonitrile/0.1% TFA over a 45 minute period. Protein D4-cc1 eluted at 26% acetonitrile/0.1% TFA. 

EXAMPLE 28 

30 Purification of Hydroxyproline-containing C-terminal Fragment of Type I (<x2) Collagen 

[0229] Cell paste harvested from a 1 L culture grown as in Example 26 was resuspended in 30 ml of lysis buffer (2M 
urea, 137mM NaCI, 2.7mM KCI, 4.3mM Na 2 HP0 4 , 1.4mM KH 2 P0 4 , 10mM EDTA, 10mM pME, 0.1% Triton X-100, pH 
7.4) at 4°C. Lysozyme (chicken egg white) was added to 100 \ig/m\ and the solution incubated at 4°C for 30 minutes. 

35 The solution was passed twice through a cell disruption press (SLM Instruments, Rochester, NY) and then centrifuged 
at 30,000 x g for 30 minutes. The pellet was resuspended in 30 ml of 50 mM Tris-HCI, pH 7.6, centrifuged at 30,000 
x g for 30 minutes, and the pellet solubilized in 25 ml of solubilization buffer (8M urea, 1 37mM NaCI, 2.7mM KCI, 4.3mM 
Na 2 HP0 4 , 1.4mM KH 2 P0 4 , 5mM EDTA, 5mM pME). The solution was centrifuged at 30,000xg for 30 minutes and 
supernatent dialyzed against two changes of 4 L of distilled water at 4°C. Following dialysis, the entire mixture was 

40 lyophilized. The lyophilized solid was dissolved in 0.1 M HCI in a flask with stirring. After addition of a 5-fold excess of 
crystalline BrCN, the flask was evacuated and filled with nitrogen. Cleavage was allowed to proceed for 24 hrs, at 
which time the solvent was removed in vacuo. The residue was dissolved in 0.1 % trifluoroacetic acid (TFA) and purified 
by reverse-phase HPLC using a Vydac C4 RP-HPLC column (10x250mm, 5|x, 300 A) on a BioCad Sprint system 
(Perceptive Biosystems, Framingham, MA). Hydroxyproline-containing D4 protein was eluted with a gradient of 1 5-40% 

45 acetonitrile/0.1 % TFA over a 45 minute period. Protein D4-a2 eluted at 25% acetonitrile/0.1 % TFA. 

EXAMPLE 29 

Amino Acid Composition Analysis of Hydroxyproline-containing C-terminal Fragment of Type I (a1) Collagen 

50 

[0230] Protein D4-ct1 (10|ig) purified as in Example 27 was taken to dryness in vacuo in a 1.5 ml microcentrifuge 
tube. A sample was subjected to amino acid analysis at the W.M. Keck Foundation Biotechnology Resource Laboratory 
(New Haven, CT) on an Applied Biosystems sequencer equipped with an on-line HPLC system. The experimentally 
determined sequence of the first 1 3 amino acids (SEQ. ID. NO. 41 ) and the sequence predicted from the DNA sequence 
55 (SEQ. ID. NO. 42) are shown in Figure 69. A sample of protein D4-al was subjected to mass spectral analysis on a 
VG Biotech BIO-Q quadrople analyzer at M-Scan, Inc. (West Chester, PA). The mass spectrum and the predicted 
molecular weight of protein D4-a1 if it contained 100% hydroxyproline in lieu of proline are given in Figure 70. The 
predicted molecular weight of protein D4-a1 containing 100% hydroxyproline in lieu of proline is 20807.8 Da. The 
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experimentally determined molecular weight was 20807.5 Da. 
EXAMPLE 30 

5 Construction of Carboxy Terminal 219 Amino Acids of Human Collagen Type I (a1) Fragment Gene with Optimized E. 
Coli Codon Usage. 

[0231] The nucleotide sequence of the 657 nucleotide gene for the carboxy terminal 219 amino acids of human Type 
I (ot1 ) collagen with optimized E. Coli codon usage is shown in Figure 71 . For synthesis of this gene, unique restriction 

10 sites were identified or created approximately every 150 base pairs. Oligos of approximately 80 nucleotides were 
synthesized on a Beckman Oligo 1000 DNA synthesizer, cleaved and deprotected with aqueous NH 4 OH, and purified 
by electrophoresis in 7M urea/1 2% polyacrylamide gels. Each set of oligos was designed to have an EcoR I restriction 
enzyme site at the 5' end, a unique restriction site near the 3' end, followed by the TAAT stop sequence and a Hind III 
restriction enzyme site at the very 3' end. The first four oligos, comprising the first 84 amino acids of the carboxy terminal 

is 219 amino acids of human Type I (a1) collagen with optimized E. coli codon usage, are given in Figure 81 (SEQ. ID. 
NOS. 47-50). 

[0232] Oligos N4-1 (SEQ. ID. NO. 47) and N4-2 (SEQ. ID. NO. 48) (1 \ig each) were annealed in 20 uL of T7 DNA 
polymerase buffer (40mM Tris-HCI (pH 8.0), 5mM MgC! 2 , 5mM dithiothreitol, 50mM NaCI, 0.05 mg/mL bovine serum 
albumin) by heating at 90°C for 5 minutes followed by slow cooling to room temperature. After brief centrifugation at 

20 14,000 rpm, 10 units of T7 DNA polymase and 2 uL of a solution of all four dNTPs (dATP, dGTP, dCTP, dTTP, 2.5mM 
each) were added to the annealed oligos. Extension reactions were incubated at 37°C for 30 minutes and then heated 
at 70°C for 10 minutes. After cooling to room temperature, Hind III buffer (5 uL of 10 x concentration), 20 \iL of H 2 0, 
and 10 units of Hind III restriction enzyme were added and the tubes incubated at 37°C for 10 hours. Hind III buffer (2 
uL of 1 0x concentration), 1 3.5 uL of 0.5M Tris HCI (pH 7.5), 1 .8 \lL of 1 % Triton X1 00, 5.6 uL of H20, and 20 U of EcoR 

25 | were added to each tube and incubation continued for 2 hours at 37°C. Digests were extracted once with an equal 
volume of phenol, once with phenol/chloroform/isoamy! alcohol, and once with chloroform/isoamyl alcohol. After eth- 
anol precipitation, the pellet was resuspended in 1 0 ul of TE buffer (1 OmM Tris HCI (pH 8.0), 1 mM EDTA). Resuspended 
pellet 4 \iL of was ligated overnight at 16°C with agarose gel-purified EcoRI/Hind III digested pBSKS + vector (1 u,g) 
using T4 DNA ligase (100 units). One half of the transformation mixture was transformed by heat shock into DH5ot 

30 cells and 1 00 \iL of the 1 .0 mL transformation mixture was plated on Luria Broth (LB) agar plates containing 70 ^ig/mL 
ampicillin. Plates were incubated overnight at 37°C. Ampicillin resistant colonies (6-12) were picked and grown over- 
night in LB media containing 70p.g/mL ampicillin. Plasmid DNA was isolated from each culture by Wizard Minipreps 
(Promega Corporation, Madison Wl) and screened for the presence of the approximately 120 base pair insert by di- 
gestion with EcoRI and Hind III and running the digestion products on agarose electrophoresis gels. Clones with inserts 

35 were confirmed by standard dideoxy termination DNA sequencing. The correct clone was named pBSN4-1. 

[0233] Oligos N4-3 (SEQ. ID. NO. 49) and N4-4 (SEQ. ID. NO. 50) (Figure 81) were synthesized, purified, annealed, 
extended, and cloned into pBSKS + following exactly the same procedure given above for oligos N4-1 and N4-2. The 
resulting plasmid was named pBSN4-2A. To clone together the sections of the collagen gene from pBSN4-1 and 
pBSN4-2A t plasmid pBSN4-1 (1u,g) was digested for 2 hours at 37°C with Apa L1 and Hind III. The digested vector 

40 was purified by agarose gel electrophoresis. Plasmid pBSN4-2A (3 u,g) was digested for 2 hours at 37°C with Apa L1 
and Hind III and the insert purified by agarose gel electrophoresis. Apa L1/Hind Ill-digested pBSN4-1 was ligated with 
this insert overnight at 16°C with T4 DNA ligase. One half of the ligation mixture was transformed into DH5a cells and 
1/10 of the transformation mixture was plated on LB agar plates containing 70u.g/mL ampicillin. After overnight incu- 
bation at 37°C, ampicillin-resistant clones were picked and screened for the presence of insert DNA as described 

45 above. Clones were confirmed by dideoxy termination sequencing. The correct clone was named pBSN4-2. 

[0234] In a similar manner, the remainder of the gene for the carboxy terminal 219 amino acids of human Type I (cc1) 
collagen with optimized E. coli codon usage was constructed such that the final DNA sequence is that given in Figure 
71 (SEQ. ID. NO. 43). 

[0235] It will be understood that various modifications may be made to the embodiments disclosed herein. For ex- 
50 ample, it is contemplated that any protein produced by prokaryotes and eukaryotes can be made to incorporate one 
or more amino acid analogs in accordance with the present disclosure. Therefore, the above description should not 
be construed as limiting, but merely as exemplifications of preferred embodiments. Those skilled in art will envision 
other modifications within the scope and spirit of the claims appended hereto. 
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Annex to the description 
[0236] 



10 



30 



35 



40' 



45 



50 



SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: GRUSKIN, ELLIOT A. 
15 BUECHTER, DOUGLAS 

BROKAW, JANE 
ZHANG, GUANGHUI 

20 PAOLELLA, DAVID 

(ii) TITLE OF INVENTION: AMINO ACID MODIFIED POLYPEPTIDES 

25 

(iii) NUMBER OF SEQUENCES: 50 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: DILWORTH & BARRESE 

(B) STREET: 333 EARLE OVINGTON BOULEVARD 
<C) CITY: UNIONDALE 

(D) STATE: NY 

(E) COUNTRY: U.S.A. 

(F) ZIP: 11553 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

55 (C) CLASSIFICATION: 
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5 



15 



20 



25 



30' 



35 



40 



50 



(viii) ATTORNEY/ AGENT INFORMATION: 
(A) NAME: STEEN, JEFFREY S 



(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: (516) 228-8484 
10 (B) TELEFAX: (516) 228-8516 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3170 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

CAGCTGTCTT ATGGCTATGA TGAGAAATCA ACCGGAGGAA TTTCCGTGCC TGGCCCCATG 60 

GGTCCCTCTG GTCCTCGTGG TCTCCCTGGC CCCCCTGGTG CACCTGGTCC CCAAGGCTTC 120 

CAAGGTCCCC CTGGTGAGCC TGGCGAGCCT GGAGCTTCAG GTCCCATGGG TCCCCGAGGT 180 

CCCCCAGGTC CCCCTGGAAA GAATGGAGAT GATGGGGAAG CTGGAAAACC TGGTCGTCCT 240 

45 GGTGAGCGTG GGCCTCCTGG GCCTCAGGGT GCTCGAGGAT TGCCCGGAAC AGCTGGCCTC 300 

CCTGGAATGA AGGGACACAG AGGTTTCAGT GGTTTGGATG GTGCCAAGGG AGATGCTGGT 360 

CCTGCTGGTC CTAAGGGTGA GCCTGGCAGC CCTGGTGAAA ATGGAGCTCC TGGTCAGATG 420 
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TTCCAGGGTC TCCCTGGTCC TGCTGGTCCT CCAGGTGAAG CAGGCAAACC TGGTGAACAG 1500 

5 GGTGTTCCTG GAGACCTTGG CGCCCCTGGC CCCTCTGGAG CAAGAGGCGA GAGAGGTTTC 1560 

CCTGGCGAGC GTGGTGTGCA AGGTCCCCCT GGTCCTGCTG GACCCCGAGG GGCCAACGGT 1620 

10 

GCTCCCGGCA ACGATGGTGC TAAGGGTGAT GCTGGTGCCC CTGGAGCTCC CGGTAGCCAG 1680 

GGCGCCCCTG GCCTTCAGGG AATGCCTGGT GAACGTGGTG CAGCTGGTCT TCCAGGGCCT 1740 

AAGGGTGACA GAGGTGATGC TGGTCCCAAA GGTGCTGATG GCTCTCCTGG CAAAGATGGC 1800 

20 

GTCCGTGGTC TGACCGGCCC CATTGGTCCT CCTGGCCCTG CTGGTGCCCC TGGTGACAAG 1860 

GGTGAAAGTG GTCCCAGCGG CCCTGCTGGT CCCACTGGAG CTCGTGGTGC CCCCGGAGAC 1920 

25 

CGTGGTGAGC CTGGTCCCCC CGGCCCTGCT GGCTTTGCTG GCCCCCCTGG TGCTGACGGC 1980 

3Q CAACCTGGTG CTAAAGGCGA ACCTGGTGAT GCTGGTGCCA AAGGCGATGC TGGTCCCCCT 2040 

GGGCCTGCCG GACCCGCTGG ACCCCCTGGC CCCATTGGTA ATGTTGGTGC TCCTGGAGCC 2100 

35 

AAAGGTGCTC GGGCAGCGCT GGTCCCCCTG GTGCTACTGG TTTCCCTGGT GCTGCTGGCC 2160 

GAGTCGGTCC TCCTGGCCCC TCTGGAAATG CTGGACCCCC TGGCCCTCCT GGTCCTGCTG 2220 

40 

GCAAAGAAGG CGGCAAAGGT CCCCGTGGTG AGACTGGCCC TGCTGGACGT CCTGGTGAAG 2280 

45 : TTGGTCCCCC TGGTCCCCCT GGCCCTGCTG GCGAGAAAGG ATCCCCTGGT GCTGATGGTC 2340 

CTGCTGGTGC TCCTGGTACT CCCGGGCCTC AAGGTATTGC TGGACAGCGT GGTGTGGTCG 2400 

50 

GCCTGCCTGG TCAGAGAGGA GAGAGAGGCT TCCCTGGTCT TCCTGGCCCC TCTGGTGAAC 2460 

55 



39 



10 



15 



20 



25 



30 



40 



50 



55 



EP 0 992 586 A2 

CTGGCAAACA AGGTCCCTCT GGAGCAAGTG GTGAACGTGG TCCCCCCGGT CCCATGGGCC 2520 

CCCCTGGATT GGCTGGACCC CCTGGTGAAT CTGGACGTQA GGGGGCTCCT GCTGCCGAAG 2580 

GTTCCCCTGG ACGAGACGGT TCTCCTGGCG CCAAGGGTGA CCGTGGTGAG ACCGGCCCCG 264 0 

CTGGACCCCC TGGTGCTCCT GGTGCTCCTG GTGCCCCTGG CCCCGTTGGC CCTGCTGGCA 2700 

AGAGTGGTGA TCGTGGTGAG ACTGGTCCTG CTGGTCCCGC CGGTCCCGTC GGCCCCGCTG 2760 

GCGCCCGTGG CCCCGCCGGA CCCCAAGGCC CCCGTGGTGA CAAGGGTGAG ACAGGCGAAC 2820 

AGGGCGACAG AGGCATAAAG GGTCACCGTG GCTTCTCTGG CCTCCAGGGT CCCCCTGGCC 2660 

CTCCTGGCTC TCCTGGTGAA CAAGGTCCCT CTGGAGCCTC TGGTCCTGCT GGTCCCCGAG 2940 

GTGCCCCTGG CTCTGCTGGT GCTCCTGGCA AAGATGGACT CAACGGTCTC CCTGGCCCCA 3000 

TTGGGCCCCC TGGTCCTCGC GGTCGCACTG GTGATGCTGG TCCTGTTGGT CCCCCCGGCC 3060 

CTCCTGGACC TCCTGGTCCC CCTGGTCCTC CCAGCGCTGG TTTCGACTTC AGCTTCCTCC 3120 

CCCAGCCACC TCAAGAGAAG GCTCACGATG GTGGCCGCTA CTACCGGGCT 3170 
(2) INFORMATION FOR SEQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 240 base pairs 

(B) TYPE: nucleic acid 

« (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: CDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
CAGCTGTCTT ATGGCTATGA TGAGAAATCA ACCGGAGGAA TTTCCGTGCC TGGCCCCATG 
GGTCCCTCTG GTCCTCGTGG TCTCCCTGGC CCCCCTGGTG CACCTGGTCC CCAAGGCTTC 
CAAGGTCCCC CTGGTGAGCC TGGCGAGCCT GGAGCTTCAG GTCCCATGGG TCCCCGAGGT 
CCCCCAGGTC CCCCTGGAAA GAATGGAGAT GATGGGGAAG CTGGAAAACC TGGTCGTCCT 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 100 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
GGATCCATGG GGCTCGCTGG CCCACCGGGC GAACCGGGTC CGCCAGGCCC GAAAGGTCCG 
CGTGGCGATA GCGGGCTCCC GGGCGATTCC TAATGGATCC 
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(2) INFORMATION FOR SEQ ID NO:4: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

Gly Leu Ala Gly Pro Pro Gly Glu Pro Gly Pro Pro Gly Pro Lys Gly 
15 10 15 

Pro Arg Gly Asp Ser 
20 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 330 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 
CAGCGGGCCA GGAAGAAGAA TAAGAACTGC CGGCGCCACT CGCTCTATGT GGACTTCAGC 
GATGTGGGCT GGAATGACTG GATTGTGGCC CCACCAGGCT ACCAGGCCTT CTACTGCCAT 



42 



10 
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GGGGACTGCC CCTTTCCACT GGCTGACCAC CTCAACTCAA CCAACCATGC CATTGTGCAG 180 

ACCCTGGTCA ATTCTGTCAA TTCCAGTATC CCCAAAGCCT GTTGTGTGCC CACTGAACTG 240 

AGTGCCATCT CCATGCTGTA CCTGGATGAG TATGATAAGG TGGTACTGAA AAATTATCAG 300 

GAGATGGTAG TAGAGGGATG TGGGTGCCGC 3 30 
15 (2) INFORMATION FOR SEQ ID NO: 6: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1169 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 



20 



25 



30 



35 



40 



45 



50 



55 



(ii) MOLECULE TYPE: peptide 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val 
15 10 15 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 

Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 
35 40 45 

Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 

Pro Gly Lys Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
65 70 75 80 
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10 



15 



20 



25 



Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly 
85 90 95 

Thr Ala Gly Leu Pro Gly Met Lys Gly His Arg Gly Phe Ser Gly Leu 
100 105 110 

Asp Gly Ala Lys Gly Asp Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro 
115 120 125 

Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin Met Gly Pro Arg Gly 
130 135 140 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala 
145 150 155 160 

Arg Gly Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr 
165 170 175 



30 



Gly Pro Ala Gly Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly 
180 185 190 



35 



Glu Ala Gly Pro Gin Gly Pro Arg Gly Ser Glu Gly Pro Gin Gly Val 
195 200 205 



40 



Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala 
210 215 220 



Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Ala Asn Gly 

45 

225 230 235 240 

Ala Pro Gly lie Ala Gly Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro 
50 245 250 255 



44 
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Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly Asn Ser 
260 265 270 

Gly Glu Pro Gly Ala Pro Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly 
275 280 285 

Glu Pro Gly Pro Val Gly Val Gin Gly Pro Pro Gly Pro Ala Gly Glu 
290 295 300 

Glu Gly Lys Arg Gly Ala Arg Gly Glu Pro Gly Pro Thr Gly Leu Pro 
305 310 315 320 

Gly Pro Pro Gly Glu Arg Gly Gly Pro Gly Ser Arg Gly Phe Pro Gly 
325 330 335 

Ala Asp Gly Val Ala Gly Pro Lys Gly Pro Ala Gly Glu Arg Gly Ser 
340 345 350 

Pro Gly Pro Ala Gly Pro Lys Gly Ser Pro Gly Glu Ala Gly Arg Pro 
355 360 365 

Gly Glu Ala Gly Leu Pro Gly Ala Lys Gly Leu Thr Gly Ser Pro Gly 
370 375 380 

Ser Pro Gly Pro Asp Gly Lys Thr Gly Pro Pro Gly Pro Ala Gly Gin 
385 390 395 400 

Asp Gly Arg Pro Gly Pro Pro Gly Pro Pro Gly Ala Arg Gly Gin Ala 
405 410 415 



Gly Val Met Gly Phe Pro Gly Pro Lys Gly Ala Ala Gly Glu Pro Gly 
420 425 430 



45 



EP 0 992 586 A2 



10 



15 



20 



Lys Ala Gly Glu Arg Gly Val Pro Gly Pro Pro Gly Ala Val Gly Pro 
435 440 445 

Ala Gly Lys Asp Gly Glu Ala Gly Ala Gin Gly Pro Pro Gly Pro Ala 
450 455 460 

Gly Pro Ala Gly Glu Arg Gly Glu Gin Gly Pro Ala Gly Ser Pro Gly 
465 470 475 480 

Phe Gin Gly Leu Pro Gly Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys 
485 490 495 

Pro Gly Glu Gin Gly Val Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser 
500 505 510 



25 



Gly Ala Arg Gly Glu Arg Gly Phe Pro Gly Glu Arg Gly Val Gin Gly 
515 520 525 



30 



Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn 
530 535 540 



35 



Asp Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly Ser Gin 
545 550 555 560 



40 



Gly Ala Pro Gly Leu Gin Gly Met Pro Gly Glu Arg Gly Ala Ala Gly 
565 570 575 



45 



Leu Pro Gly Pro Lys Gly Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala 
580 585 590 



50 



Asp Gly Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Thr Gly Pro lie 
595 600 605 



55 
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20 



25 



Gly Pro Pro Gly Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly 
610 615 620 

Pro Ser Gly Pro Ala Gly Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp 
625 630 635 640 

Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro 
645 650 655 

Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly 
660 665 670 

Ala Lys Gly Asp Ala Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Pro 
675 680 685 

Pro Gly Pro He Gly Asn Val Gly Ala Pro Gly Ala Lys Gly Ala Arg 
690 695 700 



30 



Gly Ser Ala Gly Pro Pro Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly 
705 710 7X5 720 



35' 



Arg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro 
725 730 735 



40 



Pro Gly Pro Ala Gly Lys Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr 
740 745 750 



45 



Gly Pro Ala Gly Arg Pro Gly Glu Val Gly Pro Pro Gly Pro Pro Gly 
755 760 765 



50 



Pro Ala Gly Glu Lys Gly Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala 
770 775 780 



55 
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.15 



20 



25 



Pro Gly Thr Pro Gly Pro Gin Gly lie Ala Gly Gin Arg Gly Val Val 
785 790 795 800 

Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly Phe Pro Gly Leu Pro Gly 
805 810 815 

Pro Ser Gly Glu Pro Gly Lys Gin Gly Pro Ser Gly Ala Ser Gly Glu 
620 825 830 

Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro 
835 840 845 

Gly Glu Ser Gly Arg Glu Gly Ala Pro Ala Ala Glu Gly Ser Pro Gly 
850 855 860 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro 
865 870 875 880 



30 



Ala Gly Pro Pro Gly Ala Xaa Gly Ala Xaa Gly Ala Pro Gly Pro Val 
885 890 895 



35 



Gly Pro Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly 
900 905 910 



40 



Pro Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro 
915 920 925 



45 



Gin Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg 
930 935 . 940 



50 



Gly lie Lys Gly His Arg Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly 
945 950 955 960 
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Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro 
5 965 970 975 

Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp 
10 980 985 990 

Gly Leu Asn Gly Leu Pro Gly Pro lie Gly Pro Pro Gly Pro Arg Gly 
995 1000 1005 



15 



20 



30 



Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro 
1010 1015 1020 

Pro Gly Pro Pro Gly Pro Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu 
1025 1030 1035 . 1040 

Pro Gin Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg 
1045 1050 1055 

Ala Arg Ser Gin Arg Ala Arg Lys Lys Asn Lys Asn Cys Arg Arg His 
1060 1065 1070 

Ser Leu Tyr Val Asp Phe Ser Asp Val Gly Trp Asn Asp Trp lie Val 
1075 1080 1085 

Ala Pro Pro Gly Tyr Gin Ala Phe Tyr Cys His Gly Asp Cys Pro Phe 
1090 1095 1100 

Pro Leu Ala Asp His Leu Asn Ser Thr Asn His Ala lie Val Gin Thr 
1105 1110 1115 1120 



so Leu Val Asn Ser Val Asn Ser Ser lie Pro Lys Ala Cys Cys Val Pro 

1125 1130 1135 



55 



40 



45 
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Thr Glu Leu Ser Ala lie Ser Met Leu Tyr Leu Asp Glu Tyr Asp Lys 
1140 1145 1150 

5 

Val Val Leu Lys Asn Tyr Gin Glu Met Val Val Glu Gly Cys Gly Cys 
1155 1160 1165 

10 

Arg 

15 

(2) INFORMATION FOR SEQ ID NO: 7: 

2Q (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3531 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
25 (D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: cDNA 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
. GGGAAGGATT TCCATTTCCC AGCTGTCTTA TGGCTATGAT GAGAAATCAA CCGGAGGAAT 60 

TTCCGTGCCT GGCCCCATGG GTCCCTCTGG TCCTCGTGGT CTCCCTGGCC CCCCTGGTGC 120 

40 

ACCTGGTCCC CAAGGCTTCC AAGGTCCCCC TGGTGAGCCT GGCGAGCCTG GAGCTTCAGG 180 

TCCCATGGGT CCCCGAGGTC CCCCAGGTCC CCCTGGAAAG AATGGAGATG ATGGGGAAGC 240 

45 

TGGAAAACCT GGTCGTCCTG GTGAGCGTGG GCCTCCTGGG CCTCAGGGTG CTCGAGGATT 300 

50 GCCCGGAACA GCTGGCCTCC CTGGAATGAA GGGACACAGA GGTTTCAGTG GTTTGGATGG 360 

TGCCAAGGGA GATGCTGGTC CTGCTGGTCC TAAGGGTGAG CCTGGCAGCC CTGGTGAAAA 420 

55 



50 




51 
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CCCTGCTGGC TCCCCCGGAT TCCAGGGTCT CCCTGGTCCT GCTGGTCCTC CAGGTGAAGC 1500 

AGGCAAACCT GGTGAACAGG GTGTTCCTGG AGACCTTGGC GCCCCTGGCC CCTCTGGAGC 1560 

AAGAGGCGAG AGAGGTTTCC CTGGCGAGCG TGGTGTGCAA GGTCCCCCTG GTCCTGCTGG 1620 

ACCCCGAGGG GCCAACGGTG CTCCCGGCAA CGATGGTGCT AAGGGTGATG CTGGTGCCCC 1680 

15 TGGAGCTCCC GGTAGCCAGG GCGCCCCTGG CCTTCAGGGA ATGCCTGGTG AACGTGGTGC 1740 

AGCTGGTCTT CCAGGGCCTA AGGGTGACAG AGGTGATGCT GGTCCCAAAG GTGCTGATGG 1800 

CTCTCCTGGC AAAGATGGCG TCCGTGGTCT GACCGGCCCC ATTGGTCCTC CTGGCCCTGC I860 

TGGTGCCCCT GGTGACAAGG GTGAAAGTGG TCCCAGCGGC CCTGCTGGTC CCACTGGAGC 1920 

TCGTGGTGCC CCCGGAGACC GTGGTGAGCC TGGTCCCCCC GGCCCTGCTG GCTTTGCTGG 1980 

CCCCCCTGGT GCTGACGGCC AACGTGGTGC TAAAGGCGAA CCTGGTGATG CTGGTGCCAA 2040 

AGGCGATGGG TCCCCCTGGG CCTGCCGGAC CCGCTGGACC CCCTGGCCCC ATTGGTAATG 2100 

TTGGTGCTCC TGGAGCCAAA GGTGCTCGCG GCAGCGCTGG TCCCCCTGGT GCTACTGGTT 2160 

TCCCTGGTGC TGCTGGCCGA GTCGGTCCTC CTGGCCCCTC TGGAAATGCT GGACCCCCTG 2220 

GCCCTCCTGG TCCTGCTGGC AAAGAAGGCG GCAAAGGTCC CCGTGGTGAG ACTGGCCCTG 2280 

CTGGACGTCC TGGTGAAGTT GGTCCCCCTG GTCCCCCTGG CCCTGCTGGC GAGAAAGGAT 2340 

45 

CCCCTGGTGC TGATGGTCCT GCTGGTGCTC CTGGTACTCC CGGGCCTCAA GGTATTGCTG 2400 

50 GACAGCGTGG TGTGGTCGGC CTGCCTGGTC AGAGAGGAGA GAGAGGCTTC CCTGGTCTTC 2460 



20 



25 



30 



35 



40 



52 



CTGGCCCCTC TGGTGAACCT 
CCCCCGGTCC CATGGGCCCC 
GGGCTCCTGC TGCCGAAGGT 
GTGGTGAGAC CGGCCCCGCT 
GTTGGCCCTG CTGGCAAGAG 
CCCGTCGGCC CCGCTGGCGC 
GGTGAGACAG GCGAACAGGG 
CAGGGTCCCC CTGGCCCTCC 
CCTGCTGGTC CCCGAGGTCC 
GGTCTCCCTG GCCCCATTGG 
GTTGGTCCCC CCGGCCCTCC 
GACTTCAGCT TCCTCCCCCA 
CGGGCTAGAT CCCAGCGGGC 
GTGGACTTCA GCGATGTGGG 
TTCTACTGCC ATGGGGACTG 
GCCATTGTGC AGACCCTGGT 
CCCACTGAAC. TGAGTGCCAT 



EP 0 992 586 A; 

GGCAAACAAG GTCCCTCTGG 
CCTGGATTGG CTGGACCCCC 
TCCCCTGGAC GAGACGGTTC 
GGACCCCCTG GTGCTCTGGT 
TGGTGATCGT GGTGAGACTG 
CCGTGGCCCC GCCGGACCCC 
CGACAGAGGC ATAAAGGGTC 
TGGCTCTCCT GGTGAACAAG 
CCCTGGCTCT GCTGGTGCTC 
GCCCCCTGGT CCTCGCGGTC 
TGGACCTCCT GGTCCCCCTG 
GCCACCTCAA GAGAAGGCTC 
CAGGAAGAAG AATAAGAACT 
CTGGAATGAC TGGATTGTGG 
CCCCTTTCCA CTGGCTGACC 
CAATTCTGTC AATTCCAGTA 
CTCCATGCTG TACCTGGATG 



AGCAAGTGGT GAACGTGGTC 
TGGTGAATCT GGACGTGAGG 
TCCTGGCGCC AAGGGTGACC 
GCTCTGGTGC CCCTGGCCCC 
GTCCTGCTGG TCCCGCCGGT 
AAGGCCCCCG TGGTGACAAG 
ACCGTGGCTT CTCTGGCCTC 
GTCCCTCTGG AGCCTCTGGT 
CTGGCAAAGA TGGACTCAAC 
GCACTGGTGA TGCTGGTCCT 
GTCCTCCCAG CGCTGGTTTC 
ACGATGGTGG CCGCTACTAC 
GCCGGCGCCA CTCGCTCTAT 
CCCCACCAGG CTACCAGGCC 
ACCTCAACTC AACCAACCAT 
TCCCCAAAGC CTGTTGTGTG 
AGTATGATAA GGTGGTACTG 



53 
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AAAAATTATC AGGAGATGGT AGTAGAGGGA TGTGGGTGCC GCTAAAAGCT T 3531 

5 

(2) INFORMATION FOR SEQ ID NO: 8: 

■10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1171 amino acids 

(B) TYPE: amino acid 

, f5 (C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 



20 



25 



(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val 
15 10 15 



30 pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 

20 25 30 



35 



40 



45 



Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 
35 40 45 

Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 

Pro Gly Lys Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
65 70 75 80 



Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly 
50 85 90 95 

Thr Ala Gly Leu Pro Gly Met Lys Gly His Arg Gly Phe Ser Gly Leu 
100 105 110 

55 
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15 



20 



Asp Gly Ala Lys Gly Asp Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro 
115 120 125 

Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin Met Gly Pro Arg Gly 
130 135 140 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala 
145 150 155 160 

Arg Gly Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr 
165 170 175 

Gly Pro Ala Gly Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly 
180 185 190 



25 



Glu Ala Gly Pro Gin Gly Pro Arg Gly Ser Glu Gly Pro Gin Gly Val 
195 200 205 



30 



Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala 
210 215 220 



35 



Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Ala Asn Gly 
225 230 235 240 



40 



Ala Pro Gly lie Ala Gly Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro 
245 250 255 



45 



Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly Asn Ser 
260 265 270 



50 



Gly Glu Pro Gly Ala Pro Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly 
275 280 285 



55 



55 



EP 0 992 586 A2 



Glu Pro Gly Pro Val Gly Val Gin Gly Pro Pro Gly Pro Ala Gly Glu 
290 295 300 



10 



Glu Gly Lys Arg Gly Ala Arg Gly Glu Pro Gly Pro Thr Gly Leu Pro 
305 310 315 320 



15 



Gly Pro Pro Gly Glu Arg Gly Gly Pro Gly Ser Arg Gly Phe Pro Gly 
325 330 335 



20 



Ala Asp Gly Val Ala Gly Pro Lys Gly Pro Ala Gly Glu Arg Gly Ser 
340 345 350 



25 



30 



35' 



40 



Pro Gly Pro Ala Gly Pro Lys Gly Ser Pro Gly Glu Ala Gly Arg Pro 
355 360 365 

Gly Glu Ala Gly Leu Pro Gly Ala Lys Gly Leu Thr Gly Ser Pro Gly 
370 375 380 

Ser Pro Gly Pro Asp Gly Lys Thr Gly Pro Pro Gly Pro Ala Gly Gin 
385 390 395 400 

Asp Gly Arg Pro Gly Pro Pro Gly Pro Pro Gly Ala Arg Gly Gin Ala 
405 410 415 

Gly Val Met Gly Phe Pro Gly Pro Lys Gly Ala Ala Gly Glu Pro Gly 
420 425 430 



45 



Lys Ala Gly Glu Arg Gly Val Pro Gly Pro Pro Gly Ala Val Gly Pro 
435 440 445 



50 



Ala Gly Lys Asp Gly Glu Ala Gly Ala Gin Gly Pro Pro Gly Pro Ala 
450 455 460 
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35 



45 
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Gly Pro Ala Gly Glu Arg Gly Glu Gin Gly Pro Ala Gly Ser Pro Gly 
465 470 475 480 

Phe Gin Gly Leu Pro Gly Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys 
485 490 495 



Pro Gly Glu Gin Gly Val Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser 
500 505 510 

15 

Gly Ala Arg Gly Glu Arg Gly Phe Pro Gly Glu Arg Gly Val Gin Gly 
515 ( 520 525 

20 

Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn 
530 535 540 



Asp Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly Ser Gin 
545 550 555 560 

Gly Ala Pro Gly Leu Gin Gly Met Pro Gly Glu Arg Gly Ala Ala Gly 
565 570 575 

Leu Pro Gly Pro Lys Gly Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala 
580 585 590 

Asp Gly Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Thr Gly Pro lie 
595 600 605 

Gly Pro Pro Gly Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly 
610 615 620 

Pro Ser Gly Pro Ala Gly Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp 
625 630 635 640 
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Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro 
645 650 655 

Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly 
660 665 670 

Ala Lys Gly Asp Ala Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Pro 
675 680 685 

Pro Gly Pro He Gly Asn Val Gly Ala Pro Gly Ala Lys Gly Ala Arg 
690 695 700 

Gly Ser Ala Gly Pro Pro Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly 
705 710 715 720 



25 



Arg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro 
725 730 735 



30 



Pro Gly Pro Ala Gly Lys Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr 
740 745 750 



35 



Gly Pro Ala Gly Arg Pro Gly Glu Val Gly Pro Pro Gly Pro Pro Gly 
755 760 765 



40 



Pro Ala Gly Glu Lys Gly Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala 
770 775 780 



45 



Pro Gly Thr Pro Gly Pro Gin Gly lie Ala Gly Gin Arg Gly Val Val 
785 790 795 800 



50 



Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly Phe Pro Gly Leu Pro Gly 
805 810 815 



55 
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pro Ser Gly Glu Pro Gly Lys Gin Gly Pro Ser Gly Ala Ser Gly Glu 
820 B25 830 

Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro 
835 840 845 

Gly Glu Ser Gly Arg Glu Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly 
850 855 860 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro 
865 870 875 880 

Ala Gly Pro Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Val 
885 890 895 

Gly Pro Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly 
900 905 910 

Pro Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro 
915 920 925 

Gin Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg 
930 935 940 

Gly He Lys Gly His Arg Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly 
945 950 955 960 

Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro 
965 970 975 



Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp 
980 985 990 
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Gly Leu Asn Gly Leu Pro Gly Pro He Gly Pro Pro Gly Pro Arg Gly 
995 1000 1005 

Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro 
1010 1015 1020 

Pro Gly Pro Pro Gly Pro Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu 
1025 1030 1035 1040 

Pro Gin Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg 
1045 1050 1055 

Ala Arg Ser Ala Leu Asp Thr Asn Tyr Cys Phe Ser Ser Thr Glu Lys 
1060 1065 1070 

Asn Cys Cys Val Arg Gin Leu Tyr He Asp Phe Arg Lys Asp Leu Gly 
1075 1080 1085 

Trp Lys Trp He His Glu Pro Lys Gly Tyr His Ala Asn Phe Cys Leu 
1090 1095 1100 

Gly Pro Cys Pro Tyr He Trp Ser Leu Asp Thr Gin Tyr Ser Lys Val 
1105 1110 1115 1120 

Leu Ala Leu Tyr Asn Gin His Asn Pro Gly Ala Ser Ala Ala Pro Cys 
1125 1130 1135 

Cys Val Pro Gin Ala Leu Glu Pro Leu Pro He Val Tyr Tyr Val Gly 
1140 1145 1150 

Arg Lys Pro Lys Val Glu Gin Leu Ser Asn Met He Val Arg Ser Cys 
1155 1160 1165 



60 
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Lys Cys Ser 
1170 

5 

(2) INFORMATION FOR SEQ ID NO: 9: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3541 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
25 GGGAAGGATT TCCATTTCCC AGCTGTCTTA TGGCTATGAT GAGAAATCAA CCGGAGGAAT 60 

TTCCGTGCCT GGCCCCATGG GTCCCTCTGG TCCTCGTGGT CTCCCTGGCC CCCCTGGTGC 120 

30 

ACCTGGTCCC CAAGGCTTCC AAGGTCCCCC TGGTGAGCCT GGCGAGCCTG GAGCTTCAGG 180 
TCCCATGGGT CCCCGAGGTC CCCCAGGTCC CCCTGGAAAG AATGGAGATG ATGGGGAAGC 240 

35 

TGGAAAACCT GGTCGTCCTG GTGAGCGTGG GCCTCCTGGG CCTCAGGGTG CTCGAGGATT 300 

40 GCCCGGAACA GCTGGCCTCC CTGGAATGAA GGGACACAGA GGTTTCAGTG GTTTGGATGG 360 

i 

TGCCAAGGGA GATGCTGGTC CTGCTGGTCC TAAGGGTGAG CCTGGCAGCC CTGGTGAAAA 420 

45 

TGGAGCTCCT GGTCAGATGG GCCCCCGTGG CCTGCCTGGT GAGAGAGGTC GCCCTGGAGC 480 
CCCTGGCCCT GCTGGTGCTC GTGGAAATGA TGGTGCTACT GGTGCTGCCG GGCCCCCTGG 540 

50 

TCCCACCGGC CCCGCTGGTC CTCCTGGCTT CCCTGGTGCT GTTGGTGCTA AGGGTGAAGC 600 

55 



61 
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TGGTCCCCAA GGGCCCCGAG GCTCTGAAGG TCCCCAGGGT GTGCGTGGTG AGCCTGGCCC 660 

CCCTGGCCCT GCTGGTGCTG CTGGCCCTGC TGGAAACCCT GGTGCTGATG GACAGCCTGG 720 

TGCTAAAGGT GCCAATGGTG CTCCTGGTAT TGCTGGTGCT CCTGGCTTCC CTGGTGCCCG 780 

AGGCCCCTCT GGACCCCAGG GCCCCGGCGG CCCTCCTGGT CCCAAGGGTA ACAGCGGTGA 840 

ACCTGGTGCT CCTGGCAGCA AAGGAGACAC TGGTGCTAAG GGAGAGCCTG GCCCTGTTGG 900 

TGTTCAAGGA CCCCCTGGCC CTGCTGGAGA GGAAGGAAAG CGAGGAGCTC GAGGTGAACC 960 

CGGACCCACT GGCCTGCCCG GACCCCCTGG CGAGCGTGGT GGACCTGGTA GCCGTGGTTT 1020 

CCCTGGCGCA GATGGTGTTG CTGGTCCCAA GGGTCCCGCT GGTGAACGTG GTTCTCCTGG 1080 

CCCCGCTGGC CCCAAAGGAT CTCCTGGTGA AGCTGGTCGT CCCGGTGAAG CTGGTCTGCC 1140 

TGGTGCCAAG GGTCTGACTG GAAGCCCTGG CAGCCCTGGT CCTGATGGCA AAACTGGCCC 1200 

CCCTGGTCCC GCCGGTCAAG ATGGTCGCCC CGGACCCCCA GGCCCACCTG GTGCCCGTGG 1260 

TCAGGCTGGT GTGATGGGAT TCCCTGGACC TAAAGGTGCT GCTGGAGAGC CCGGCAAGGC 1320 

TGGAGAGCGA GGTGTTCCCG GACCCCCTGG CGCTGTCGGT CCTGCTGGCA AAGATGGAGA 1380 

GGCTGGAGCT CAGGGACCCC CTGGCCCTGC TGGTCCCGCT GGCGAGAGAG GTGAACAAGG 1440 

CCCTGCTGGC TCCCCCGGAT TCCAGGGTCT CCCTGGTCCT GCTGGTCCTC CAGGTGAAGC 1500 

AGGCAAACCT GGTGAACAGG GTGTTCCTGG AGACCTTGGC GCCCCTGGCC CCTCTGGAGC 1560 

AAGAGGCGAG AGAGGTTTCC CTGGCGAGCG TGGTGTGCAA GGTCCCCCTG GTCCTGCTGG 1620 



62 
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ACCCCGAGGG GCCAACGGTG CTCCCGGCAA CGATGGTGCT AAGGGTGATG CTGGTGCCCC 1680 

5 

TGGAGCTCCC GGTAGCCAGG GCGCCCCTGG CCTTCAGGGA ATGCCTGGTG AACGTGGTGC 174 0 

AGCTGGTCTT CCAGGGCCTA AGGGTGACAG AGGTGATGCT GGTCCCAAAG GTGCTGATGG 1800 

10 

CTCTCCTGGC AAAGATGGCG TCCGTGGTCT GACCGGCCCC ATTGGTCCTC CTGGCCCTGC 18 SO 

15 TGGTGCCCCT GGTGACAAGG GTGAAAGTGG TCCCAGCGGC CCTGCTGGTC CCACTGGAGC 1920 

TCGTGGTGCC CCCGGAGACC GTGGTGAGCC TGGTCCCCCC GGCCCTGCTG GCTTTGCTGG 1980 

20 

CCCCCCTGGT GCTGACGGCC AACCTGGTGC TAAAGGCGAA CCTGGTGATG CTGGTGCCAA 2040 

AGGCGATGCT GGTCCCCCTG GGCCTGCCGG ACCCGCTGGA CCCCCTGGCC CCATTGGTAA 2100 

25 

TGTTGGTGCT CCTGGAGCCA AAGGTGCTCG CGGCAGCGCT GGTCCCCCTG GTGCTACTGG 2160 

30 TTTCCCTGGT GCTGCTGGCC GAGTCGGTCC TCCTGGCCCC TCTGGAAATG CTGGACCCCC 2220 

TGGCCCTCCT GGTCCTGCTG GCAAAGAAGG CGGCAAAGGT CCCCGTGGTG AGACTGGCCC 2280 

35 

TGCTGGACGT CCTGGTGAAG TTGGTCCCCC TGGTCCCCCT GGCCCTGCTG GCGAGAAAGG 2340 

ATCCCCTGGT GCTGATGGTC CTGCTGGTGC TCCTGGTACT CCCGGGCCTC AAGGTATTGC 2400 

40 

TGGACAGCGT GGTGTGGTCG GCCTGCCTGG TCAGAGAGGA GAGAGAGGCT TCCCTGGTCT 2460 

, c TCCTGGCCCC TCTGGTGAAC CTGGCAAACA AGGTCCCTCT GGAGCAAGTG GTGAACGTGG 2520 

45 

TCCCCCCGGT CCCATGGGCC CCCCTGGATT GGCTGGACCC CCTGGTGAAT CTGGACGTGA 2580 

50 

GGGGGCTCCT GCTGCCGAAG GTTCCCCTGG ACGAGACGGT TCTCCTGGCG CCAAGGGTGA 264 0 

55 
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CCGTGGTGAG ACCGGCCCCG CTGGACCCCC TGGTGCTCCT GGTGCTCCTG GTGCCCCTGG 2700 

CCCCGTTGGC CCTGCTGGCA AGAGTGGTGA TCGTGGTGAG ACTGGTCCTG CTGGTCCCGC 2760 

CGGTCCCGTC GGCCCCGCTG GCGCCCGTGG CCCCGCCGGA CCCCAAGGCC CCCGTGGTGA 2820 

CAAGGGTGAG ACAGGCGAAC AGGGCGACAG AGGCATAAAG GGTCACCGTG GCTTCTCTGG 2880 

15 CCTCCAGGGT CCCCCTGGCC CTCCTGGCTC TCCTGGTGAA CAAGGTCCCT CTGGAGCCTC 2940 

TGGTCCTGCT GGTCCCCGAG GTCCCCCTGG CTCTGCTGGT GCTCCTGGCA AAGATGGACT 3000 

CAACGGTCTC CCTGGCCCCA TTGGGCCCCC TGGTCCTCGC GGTCGCACTG GTGATGCTGG 3060 

TCCTGTTGGT CCCCCCGGCC CTCCTGGACC TCCTGGTCCC CCTGGTCCTC CCAGCGCTGG 3120 

TTTCGACTTC AGCTTCCTCC CCCAGCCACC TCAAGAGAAG GCTCACGATG GTGGCCGCTA 3180 

CTACCGGGCT AGATCTGCCC TGGACACCAA CTATTGCTTC AGCTCCACGG AGAAGAACTG 3240 

CTGCGTGCGG CAGCTGTACA TTGACTTCCG CAAGGACCTC GGCTGGAAGT GGATCCACGA 3300 

GCCCAAGGGC TACCATGCCA ACTTCTGCCT CGGGCCCTGC CCCTACATTT GGAGCCTGGA 3360 

CACGCAGTAC AGCAAGGTCC TGGCCCTGTA CAACCAGCAT AACCCGGGCG CCTCGGCGGC 3420 

GCCGTGCTGC GTGCCGCAGG CGCTGGAGCC GCTGCCCATC GTGTACTACG TGGGCCGCAA 3480 

GCCCAAGGTG GAGCAGCTGT CCAACATGAT CGTGCGCTCC TGCAAGTGCA GCTGATCTAG 3540 



20 



25 



30 



35 



40 



45 



50 
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10 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1388 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 



15 (ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



20 



25 



Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val 
15 10 15 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 



30 Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 

35 40 45 



35' 



45 



50 



Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 



pro Gly Lys Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 

40 65 70 75 80 



Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly 
85 90 95 

Thr Ala Gly Leu Pro Gly Met Lys Gly His Arg Gly Phe Ser Gly Leu 
100 105 110 
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Asp Gly Ala Lys Qly Asp Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro 
115 120 125 

Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin Met Gly Pro Arg Gly 
130 135 140 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala 
145 150 155 160 

Arg Gly Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr 
165 170 175 

Gly Pro Ala Gly Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly 
180 185 190 

Glu Ala Gly Pro Gin Gly Pro Arg Gly Ser Glu Gly Pro Gin Gly Val 
195 200 205 

Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala 
210 215 220 

Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Ala Asn Gly 
225 230 235 240 

Ala Pro Gly lie Ala Gly Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro 
245 250 255 

Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly Asn Ser 
260 265 270 

Gly Glu Pro Gly Ala Pro Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly 
275 280 285 
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Glu Pro Gly Pro Val 
290 

Glu Gly Lys Arg Gly 
305 

Gly Pro Pro Gly Glu 
325 

Ala Asp Gly Val Ala 
340 

Pro Gly Pro Ala Gly 
355 

Gly Glu Ala Gly Leu 
370 

Ser Pro Gly Pro Asp 
385 

Asp Gly Arg Pro Gly 
405 

Gly Val Met Gly Phe 
420 

Lys Ala Gly Glu Arg 
435 

Ala Gly Lys Asp Gly 
450 



Gly Val Gin Gly Pro Pro 
295 

Ala Arg Gly Glu Pro Gly 
310 315 

Arg Gly Gly Pro Gly Ser 
330 

Gly Pro Lys Gly Pro Ala 
345 

Pro Lys Gly Ser Pro Gly 
360 

Pro Gly Ala Lys Gly Leu 
375 

Gly Lys Thr Gly Pro Pro 
390 395 

Pro Pro Gly Pro Pro Gly 
410 

Pro Gly Pro Lys Gly Ala 
425 

Gly Val Pro Gly Pro Pro 
440 

Glu Ala Gly Ala Gin Gly 
455 



Gly Pro Ala Gly Glu 
300 

Pro Thr Gly Leu Pro 
320 

Arg Gly Phe Pro Gly 
335 

Gly Glu Arg Gly Ser 
350 

Glu Ala Gly Arg Pro 
365 

Thr Gly Ser Pro Gly 
380 

Gly Pro Ala Gly Gin 
400 

Ala Arg Gly Gin Ala 
415 

Ala Gly Glu Pro Gly 
430 

Gly Ala Val Gly Pro 
445 

Pro Pro Gly Pro Ala 
460 
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Gly Pro Ala Gly Glu Arg Gly Glu Gin Gly Pro Ala Gly Ser Pro Gly 
465 470 475 480 

Phe Gin Gly Leu Pro Gly Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys 
485 490 495 

Pro Gly Glu Gin Gly Val Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser 
500 505 510 

Gly Ala Arg Gly Glu Arg Gly Phe Pro Gly Glu Arg Gly Val Gin Gly 
515 520 525 

Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn 
530 535 540 

Asp Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly Ser Gin 
545 550 555 560 

Gly Ala Pro Gly Leu Gin Gly Met Pro Gly Glu Arg Gly Ala Ala Gly 
565 570 575 

Leu Pro Gly Pro Lys Gly Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala 
580 585 590 

Asp Gly Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Thr Gly Pro lie 
595 600 605 

Gly Pro Pro Gly Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly 
610 615 620 

Pro Ser Gly Pro Ala Gly Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp 
625 630 635 640 
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15 



20 



Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro 
645 650 655 

Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly 
660 665 670 

Ala Lys Gly Asp Ala Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Pro 
675 680 685 

Pro Gly Pro He Gly Asn Val Gly Ala Pro Gly Ala Lys Gly Ala Arg 
690 695 700 

Gly Ser Ala Gly Pro Pro Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly 
705 710 715 720 



25 



Arg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro 
725 730 735 



30 



Pro Gly Pro Ala Gly Lys Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr 
740 745 750 



35 



Gly Pro Ala Gly Arg Pro Gly Glu Val Gly Pro Pro Gly Pro Pro Gly 
755 760 765 



40 



Pro Ala Gly Glu Lys Gly Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala 
770 775 780 



45 



Pro Gly Thr Pro Gly Pro Gin Gly He Ala Gly Gin Arg Gly Val Val 
785 790 795 800 



50 



Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly Phe Pro Gly Leu Pro Gly 
805 810 815 
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Pro Ser Gly Glu Pro Gly Lys Gin Gly Pro Ser Gly Ala Ser Gly Glu 
820 825 830 

Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro 
835 840 845 

Gly Glu Ser Gly Arg Glu Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly 
850 655 860 



Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro 

865 870 875 880 

Ala Gly Pro Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Val 

885 890 895 



Gly Pro Ala Gly Lys Ser Gly Asp 
900 

Pro Ala Gly Pro Val Gly Pro Ala 
915 920 



Arg Gly Glu Thr Gly Pro Ala Gly 
905 910 

Gly Ala Arg Gly Pro Ala Gly Pro 
925 



Gin Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg 
930 935 940 

Gly He Lys Gly His Arg Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly 
945 950 955 960 



Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro 
965 970 975 

Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp 
980 985 990 
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Gly Leu Asn Gly Leu Pro Gly Pro He Gly Pro Pro Gly Pro Arg Gly 
995 1000 1005 

Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro 
1010 1015 1020 

Pro Gly Pro Pro Gly Pro Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu 
1025 1030 1035 1040 

Pro Gin Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg 
1045 1050 1055 

Ala Arg Ser Asp Glu Ala Ser Gly He Gly Pro Glu Val Pro Asp Asp 
1060 1065 1070 

Arg Asp Phe Glu Pro Ser Leu Gly Pro Val Cys Pro Phe Arg Cys Gin 
1075 1080 1085 

Cys His Leu Arg Val Val Gin Cys Ser Asp Leu Gly Leu Asp Lys Val 
1090 1095 1100 

Pro Lys Asp Leu Pro Pro Asp Thr Thr Leu Leu Asp Leu Gin Asn Asn 
1105 1110 1115 1120 

Lys He Thr Glu He Lys Asp Gly Asp Phe Lys Asn Leu Lys Asn Leu 
1125 1130 1135 



45 His Ala Leu He Leu Val Asn Asn Lys He Ser Lys Val Ser Pro Gly 

1140 1145 1150 



10 



15 



20 



25 



30 



35 



40' 



50 



Ala Phe Thr Pro Leu Val Lys Leu Glu Arg Leu Tyr Leu Ser Lys Asn 
1155 1160 1165 
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Gin Leu Lys Glu Leu Pro Glu Lys Met Pro Lys Thr Leu Gin Glu Leu 
5 1170 1175 1180 

Arg Ala His Glu Asn Glu He Thr Lys Val Arg Lys Val Thr Phe Asn 
10 1185 1190 1195 1200 

Gly Leu Asn Gin Met He Val He Glu Leu Gly Thr Asn Pro Leu Lys 
15 1205 1210 1215 

Ser Ser Gly He Glu Asn Gly Ala Phe Gin Gly Met Lys Lys Leu Ser 
1220 1225 1230 



20 



25 



30 



35 



40 



45 



Tyr He Arg He Ala Asp Thr Asn He Thr Ser He Pro Gin Gly Leu 
123S 1240 1245 

Pro Pro Ser Leu Thr Glu Leu His Leu Asp Gly Asn Lys He Ser Arg 
1250 1255 1260 

Val Asp Ala Ala Ser Leu Lys Gly Leu Asn Asn Leu Ala Lys Leu Gly 
1265 1270 1275 1280 

Leu Ser Phe Asn Ser He Ser Ala Val Asp Asn Gly Ser Leu Ala Asn 
1285 1290 1295 

Thr Pro His Leu Arg Glu Leu His Leu Asp Asn Asn Lys Leu Thr Arg 
1300 1305 1310 

Val Pro Gly Gly Leu Ala Glu His Lys Tyr He Gin Val Val Tyr Leu 
1315 1320 1325 



so His Asn Asn Asn He Ser Val Val Gly Ser Ser Asp Phe Cys Pro Pro 

1330 1335 1340 
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Gly His Asn Thr Lys Lys Ala Ser Tyr Ser Gly Val Ser Leu Phe Ser 
1345 1350 1355 1360 

Asn Pro Val Gin Tyr Trp Glu He Gin Pro Ser Thr Phe Arg Cys Val 
1365 1370 1375 

Tyr Val Arg Ser Ala He Gin Leu Gly Asn Tyr Lys 
1380 1385 

INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1107 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly He Ser Val 
1 5 10 15 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 

Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 
35 40 45 

Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 
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40 
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Pro Gly Lys Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
65 70 75 80 

Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly 
85 90 95 

Thr Ala Gly Leu Pro Gly Met Lys Gly His Arg Gly Phe Ser Gly Leu 
100 105 110 

Asp Gly Ala Lys Gly Asp Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro 
115 120 125 

Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin Met Gly Pro Arg Gly 
130 135 140 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala 
145 150 155 160 

Arg Gly Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr 
165 170 175 

Gly Pro Ala Gly Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly 
180 185 190 

Glu Ala Gly Pro Gin Gly Pro Arg Gly Ser Glu Gly Pro Gin Gly Val 
195 200 205 



45 Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala 

210 215 220 



Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Ala Asn Gly 
225 230 235 240 
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Ala Pro Gly lie Ala Gly Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro 
245 250 255 

Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly Asn Ser 
260 265 270 

Gly Glu Pro Gly Ala Pro Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly 
275 280 285 

Glu Pro Gly Pro Val Gly Val Gin Gly Pro Pro Gly Pro Ala Gly Glu 
290 295 300 

Glu Gly Lys Arg Gly Ala Arg Gly Glu Pro Gly Pro Thr Gly Leu Pro 
305 310 315 320 

Gly Pro Pro Gly Glu Arg Gly Gly Pro Gly Ser Arg Gly Phe Pro Gly 
325 330 335 

Ala Asp Gly Val Ala Gly Pro Lys Gly Pro Ala Gly Glu Arg Gly Ser 
340 345 350 

Pro Gly Pro Ala Gly Pro Lys Gly Ser Pro Gly Glu Ala Gly Arg Pro 
355 360 365 

Gly Glu Ala Gly Leu Pro Gly Ala Lys Gly Leu Thr Gly Ser Pro Gly 
370 375 380 

Ser Pro Gly Pro Asp Gly Lys Thr Gly Pro Pro Gly Pro Ala Gly Gin 
385 390 395 400 



Asp Gly Arg Pro Gly Pro Pro Gly Pro Pro Gly Ala Arg Gly Gin Ala 
405 410 415 
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Gly Val Met Gly Phe Pro Gly Pro Lys Gly Ala Ala Gly Glu Pro Gly 
420 425 430 

Lys Ala Gly Glu Arg Gly Val Pro Gly Pro Pro Gly Ala Val Gly Pro 
435 440 445 

Ala Gly Lys Asp Gly Glu Ala Gly Ala Gin Gly Pro Pro Gly Pro Ala 
450 455 460 

Gly Pro Ala Gly Glu Arg Gly Glu Gin Gly Pro Ala Gly Ser Pro Gly 
465 470 475 480 

Phe Gin Gly Leu Pro Gly Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys 
485 490 495 

Pro Gly Glu Gin Gly Val Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser 
500 505 510 

Gly Ala Arg Gly Glu Arg Gly Phe Pro Gly Glu Arg Gly Val Gin Gly 
515 520 525 

Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn 
530 535 540 

Asp Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly Ser Gin 
545 550 555 560 

Gly Ala Pro Gly Leu Gin Gly Met Pro Gly Glu Arg Gly Ala Ala Gly 
565 570 575 



Leu Pro Gly Pro Lys Gly Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala 
580 585 590 
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Asp Gly Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Thr Gly Pro He 
59S 600 605 



10 



Gly Pro Pro Gly Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly 
610 615 620 



15 



Pro Ser Gly Pro Ala Gly Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp 
625 630 635 640 



20 



Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro 
645 650 655 



25 



30 



35' 



Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly 
660 665 670 

Ala Lys Gly Asp Ala Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Pro 
675 680 685 

Pro Gly Pro He Gly Asn Val Gly Ala Pro Gly Ala Lys Gly Ala Arg 
690 695 700 

Gly Ser Ala Gly Pro Pro Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly 
705 710 715 720 



40 



Arg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro 
725 730 735 



45 



Pro Gly Pro Ala Gly Lys Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr 
740 745 750 



50 



Gly Pro Ala Gly Arg Pro Gly Glu Val Gly Pro Pro Gly Pro Pro Gly 
755 760 765 
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Pro Ala Gly Glu Lys Gly Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala 
770 775 780 

Pro Gly Thr Pro Gly Pro Gin Gly lie Ala Gly Gin Arg Gly Val Val 
7B5 790 795 800 

Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly Phe Pro Gly Leu Pro Gly 
805 810 815 

Pro Ser Gly Glu Pro Gly Lys Gin Gly Pro Ser Gly Ala Ser Gly Glu 
820 825 830 

Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro 
835 840 845 

Gly Glu Ser Gly Arg Glu Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly 
850 855 860 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro 
865 870 875 880 

Ala Gly Pro Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Val 
885 890 895 

Gly Pro Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly 
900 905 910 

Pro Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro 
915 920 925 

Gin Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg 
930 935 940 
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Gly He Lys Gly His Arg Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly 
945 950 955 960 

Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro 
965 970 975 

Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp 
980 985 990 

Gly Leu Asn Gly Leu Pro Gly Pro He Gly Pro Pro Gly Pro Arg Gly 
995 1000 1005 

Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro 
1010 1015 1020 

Pro Gly Pro Pro Gly Pro Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu 
1025 1030 1035 1040 

Pro Gin Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg 
1045 1050 1055 

Ala Arg Ser Pro Lys Asp Leu Pro Pro Asp Thr Thr Leu Leu Asp Leu 
1060 1065 1070 

Gin Asn Asn Lys He Thr Glu He Lys Asp Gly Asp Phe Lys Asn Leu 
1075 1080 1085 

Lys Asn Leu His Ala Leu He Leu Val Asn Asn Lys He Ser Lys Val 
1090 1095 1100 

Ser Pro Gly 
1105 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4167 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
25 CAGCTGTCTT ATGGCTATGA TGAGAAATCA ACCGGAGGAA TTTCCGTGCC TGGCCCCATG 60 

GGTCCCTCTG GTCCTCGTGG TCTCCCTGGC CCCCCTGGTG CACCTGGTCC CCAAGGCTTC 120 

30 

CAAGGTCCCC CTGGTGAGCC TGGCGAGCCT GGAGCTTCAG GTCCCATGGG TCCCCGAGGT 180 
CCCCCAGGTC CCCCTGGAAA GAATGGAGAT GATGGGGAAG CTGGAAAACC TGGTCGTCCT 240 

35 

GGTGAGCGTG GGCCTCCTGC GCCTCAGGGT GCTCGAGGAT TGCCCGGAAC AGCTGGCCTC 300 

40 

CCTGGAATGA AGGGACACAG AGGTTTCAGT GGTTTGGATG GTGCCAAGGG AGATGCTGGT 360 
CCTGCTGGTC CTAAGGGTGA GCCTGGCAGC CCTGGTGAAA ATGGAGCTCC TGGTCAGATG 420 

45 

GGCCCCCGTG GCCTGCCTGG TGAGAGAGGT CGCCCTGGAG CCCCTGGCCC TGCTGGTGCT 480 
50 CGTGGAAATG ATGGTGCTAC TGGTGCTGCC GGGCCCCCTG GTCCCACCGG CCCCGCTGGT 540 

CCTCCTGGCT TCCCTGGTGC TGTTGGTGCT AAGGGTGAAG CTGGTCCCCA AGGGCCCCGA 600 

55 
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GGCTCTGAAG GTCCCCAGGG TGTGCGTGGT GAGCCTGGCC CCCCTGGCCC TGCTGGTGCT 660 

5 

GCTGGCCCTG CTGGAAACCC TGGTGCTGAT GGACAGCCTG GTGCTAAAGG TGCCAATGGT 720 
10 GCTCCTGGTA TTGCTGGTGC TCCTGGCTTC CCTGGTGCCC GAGGCCCCTC TGGACCCCAG 780 

GGCCCCGGCG GCCCTCCTGG TCCCAAGGGT AACAGCGGTG AACCTGGTGC TCCTGGCAGC 840 

15 

AAAGGAGACA CTGGTGCTAA GGGAGAGCCT GGCCCTGTTG GTGTTCAAGG ACCCCCTGGC 900 
CCTGCTGGAG AGCAAGGAAA GCGAGGAGCT CGAGGTGAAC CCGGACCCAC TGGCCTGCCC 960 

20 

GGACCCCCTG GCGAGCGTGG TGGACCTGGT AGCCGTGGTT TCCCTGGCGC AGATGGTGTT 1020 
25 GCTGGTCCCA AGGGTCCCGC TGGTGAACGT GGTTCTCCTG GCCCCGCTGG CCCCAAAGGA 1080 

TCTCCTCGTG AAGCTGGTCG TCCCGGTGAA GCTGGTCTGC CTGGTGCCAA GGGTCTGACT 1140 

30 

GGAAGCCCTG GCAGCCCTGG TCCTGATGGC AAAACTGGCC CCCCTGGTCC CGCCGGTCAA 1200 
GATGGTCGCC CCGGACCCCC AGGCCCACCT GGTGCCCGTG GTCAGGCTGG TGTGATGGGA 1260 

35 

TTCCCTGGAC CTAAAGGTGC TGCTCGAGAG CCCGGCAAGG CTGGAGAGCG AGGTGTTCCC 1320 
40 GGACCCCCTG GCGCTGTCGG TCCTGCTGGC AAAGATGGAG AGGCTGGAGC TCAGGGACCC 1380 

CCTGGCCCTG CTGGTCCCGC TGGCGAGAGA GGTGAACAAG GCCCTGCTGG CTCCCCCGGA 1440 

45 

TTCCAGGGTC TCCCTGGTCC TGCTGGTCCT CCAGGTGAAG CAGGCAAACC TGGTGAACAG 1500 
GGTGTTCCTG GAGACCTTGG CGCCCCTGGC CCCTCTGGAG CAAGAGGCGA GAGAGGTTTC 1560 

50 

CCTGGCGAGC GTGGTGTGCA AGGTCCCCCT GGTCCTGCTG GACCCCGAGG GGCCAACjGGT 1620 
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GCTCCCGCCA ACGATGCTGC TAAGGGTGAT 

5 

GGCGCCCCTG GCCTTCAGGG AATGCCTGGT 
AAGGGTGACA GAGGTGATGC TGGTCCCAAA 

10 

GTCCGTGGTC TGACCGACCC CATTGGTCCT 
15 GGTGAAAGTG GTCCCAGCGG CCCTGCTGGT 

CGTGGTGAGC CTGGTCCCCC CGGCCCTGCT 

'20 

CAACCTGGTG CTAAAGGCGA ACCTGGTGAT 
GGGCCTGCCG GACCCGCTGG ACCCCCTGGC 

25 

AAACGTGCTC GCGGCAGCGC TGGTCCCCCT 
30 CGAGTCGGTC CTCCTGGCCC CTCTGGAAAT 

GGCAAAGAAG GCGGCAAAGG TCCCCGTGGT 

35 

GTTGGTCCCC CTGGTCCCCC TGGCCCTGCT 
CCTGCTGGTG CTCCTGGTAC TCCCGGGCCT 

40 

GGCCTGCCTG GTCAGAGAGG AGAGAGAGGC 
45 CCTGGCAAAC AAGGTCCCTC TGGAGCAAGT 

CCCCCTGGAT TGGCTGGACC CCCTGGTGAA 

50 

GGTTCCCCTG GACGAGACGG TTCTCCTGGC 

55 
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GCTGGTGCCC CTGGAGCTCC CGGTAGCCAG 1680 

GAACGTGGTG CAGCTGGTCT TCCAGGGCCT 1740 

GGTGCTGATG GCTCTCCTGG CAAAGATGGC 1800 

CCTGGCCCTG CTGGTGCCCC TGGTGACAAG 1860 

CCCACTGGAG CTCGTGGTGC CCCCGGAGAC 1920 

GGCTTTGCTG GCCCCCCTGG TGCTGACGGC 1980 

GCTGGTGCCA AAGGCGATGC TGGTCCCCCT 2040 

CCCATTGGTA ATGTTGGTGC TCCTGGAGCC 2100 

GGTGCTACTG GTTTCCCTGG TGCTGCTGGC 2160 

GCTGGACCCC CTGGCCCTCC TGGTCCTGCT 2220 

GAGACTGGCC CTGCTGGACG TCCTGGTGAA 2280 

GGCGAGAAAG GATCCCCTGG TGCTGATGGT 2340 

CAAGGTATTG CTGGACAGCG TGGTGTGGTC 2400 

TTCCCTGGTC TTCTTGGCCC CTCTGGTGAA 2460 

GGTGAACGTG GTCCCCCCGG TCCCATGGGC 2520 

TCTGGACGTG AGGGGGCTCC TGCTGCCGAA 2580 

GCCAAGGGTG ACCGTGGTGA GACCGGCCCC 2640 
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GAAAATGGGG CTTTCCAGGG AATGAAGAAG CTCTCCTACA TCCGCATTGC TGATACCAAT 3720 

5 

ATCACCAGCA TTCCTCAAGG TCTTCCTCCT TCCCTTACGG AATTACATCT TGATGGCAAC 3780 

10 AAAATCAGCA GAGTTGATGC AGCTAGCCTG AAAGGACTGA ATAATTTGGC TAAGTTGGGA 3840 

TTGAGTTTCA ACAGCATCTC TGCTGTTGAC AATGGCTCTC TGGCCAACAC GCCTCATCTG 3900 

AGGGAGCTTC ACTTGGACAA CAACAAGCTT ACCAGAGTAC CTGGTGGGCT GGCAGAGCAT 3960 

AAGTACATCC AGGTTGTCTA CCTTCATAAC AACAATATCT CTGTAGTTGG ATCAAGTGAC 4020 

TTCTGCCCAC CTGGACACAA CACCAAAAAG GCTTCTTATT CGGGTGTGAG TCTTTTCAGC 4080 

25 AACCCGGTCC AGTACTGGGA GATACAGCCA TCCACCTTCA GATGTGTCTA CGTGCGCTCT 4140 

GCCATTCAAC TCGGAAACTA TAAGTAA 4167 



15 



20 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3349 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GGGAAGGATT TCCATTTCCC AGCTGTCTTA TGGCTATGAT GAGAAATCAA CCGGAGGAAT 60 
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TTCCGTGCCT GGCCCCATGG GTCCCTCTGG TCCTCGTGGT CTCCCTGGCC CCCCTGGTGC 120 

ACCTGGTCCC CAAGGCTTCC AAGGTCCCCC TGGTGAGCCT GGCGAGCCTG GAGCTTCAGG 180 

TCCCATGGGT CCCCGAGGTC CCCCAGGTCC CCCTGGAAAG AATGGAGATG ATGGGGAAGC 240 

TGGAAAACCT GGTCGTCCTG GTGAGCGTGG GCCTCCTGGG CCTCAGGGTG CTCGAGGATT 300 

GCCCGGAACA GCTGGCCTCC CTGGAATGAA GGGACACAGA GGTTTCAGTG GTTTGGATGG 360 

TGCCAAGGGA GATGCTGGTC CTGCTGGTCC TAAGGGTGAG CCTGGCAGCC CTGGTGAAAA 420 

TGGAGCTCCT GGTCAGATGG GCCCCCGTGG CCTGCCTGGT GAGAGAGGTC GCCCTGGAGC 480 

25 CCCTGGCCCT GCTGGTGCTC GTGGAAATGA TGGTGCTACT GGTGCTGCCG GGCCCCCTGG 540 

TCCCACCGGC CCCGCTGGTC CTCCTGGCTT CCCTGGTGCT GTTGGTGCTA AGGGTGAAGC 600 

TGGTCCCCAA GGGCCCCGAG GCTCTGAAGG TCCCCAGGGT GTGCGTGGTG AGCCTGGCCC 660 

CCCTGGCCCT GCTGGTGCTG CTGGCCCTGC TGGAAACCCT GGTGCTGATG GACAGCCTGG 720 

TGCTAAAGGT GCCAATGGTG CTCCTGGTAT TGCTGGTGCT CCTGGCTTCC CTGGTGCCCG 780 

40 AGGCCCCTCT GGACCCCAGG GCCCCGGCGG CCCTCCTGGT CCCAAGGGTA ACAGCGGTGA 840 

ACCTGGTGCT CCTGGCAGCA AAGGAGACAC TGGTGCTAAG GGAGAGCCTG GCCCTGTTGG 900 



30 



35 



45 



50- 



55 



TGTTCAAGGA CCCCCTGGCC CTGCTGGAGA GGAAGGAAAG CGAGGAGCTC GAGGTGAACC 960 
CGGACCCACT GGCCTGCCCG GACCCCCTGG CGAGCGTGGT GGACCTGGTA GCCGTGGTTT 1020 
CCCTGGCGCA GATGGTGTTG CTGGTCCCAA GGGTCCCGCT GGTGAACGTG GTTCTCCTGG 1080 
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CCCCGCTGGC CCCAAAGGAT CTCCTGGTGA 

5 

TGGTGCCAAG GGTCTGACTG GAAGCCCTGG 
to CCCTGGTCCC GCCGGTCAAG ATGGTCGCCC 

TCAGGCTGGT GTGATGGGAT TCCCTGGACC 

15 

TGGAGAGCGA GGTGTTCCCG GACCCCCTGG 
GGCTGGAGCT CAGGGACCCC CTGGCCCTGC 

20 

CCCTGCTGGC TCCCCCGGAT TCCAGGGTCT 
AGGCAAACCT GGTGAACAGG GTGTTCCTGG 

25 

AAGAGGCGAG AGAGGTTTCC CTGGCGAGCG 
30 ACCCCGAGGG GCCAACGGTG CTCCCGGCAA 

TGGAGCTCCC GGTAGCCAGG GCGCCCCTGG 

35 

AGCTGGTCTT CCAGGGCCTA AGGGTGACAG 
CTCTCCTGGC AAAGATGGCG TCCGTGGTCT 

40 

TGGTGCCCCT GGTGACAAGG GTGAAAGTGG 
TCGTGGTGCC CCCGGAGACC GTGGTGAGCC 

45 

CCCCCCTGGT GCTGACGGCC AACCTGGTGC 
50 AGGCGATGCT GGTCCCCCTG GGCCTGCCGG 



AGCTGGTCGT CCCGGTGAAG CTGGTCTGCC 1140 

CAGCCCTGGT CCTGATGGCA AAACTGGCCC 1200 

CGGACCCCCA GGCCCACCTG GTGCCCGTGG 1260 

TAAAGGTGCT GCTGGAGAGC CCGGCAAGGC 1320 

CGCTGTCGGT CCTGCTGGCA AAGATGGAGA 1380 

TGGTCCCGCT GGCGAGAGAG GTGAACAAGG 1440 

CCCTGGTCCT GCTGGTCCTC CAGGTGAAGC ISOO 

AGACCTTGGC GCCCCTGGCC CCTCTGGAGC 1560 

TGGTGTGCAA GGTCCCCCTG GTCCTGCTGG 1620 

CGATGGTGCT AAGGGTGATG CTGGTGCCCC 1680 

CCTTCAGGGA ATGCCTGGTG AACGTGGTGC 1740 

AGGTGATGCT GGTCCCAAAG GTGCTGATGG 1800 

GACCGGCCCC ATTGGTCCTC CTGGCCCTGC 1860 

TCCCAGCGGC CCTGCTGGTC CCACTGGAGC 1920 

TGGTCCCCCC GGCCCTGCTG GCTTTGCTGG 1980 

TAAAGGCGAA CCTGGTGATG CTGGTGCCAA 2040 

ACCCGCTGGA CCCCCTGGCC CCATTGGTAA 2100 
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10 



TTTCGACTTC AGCTTCCTCC CCCAGCCACC TCAAGAGAAG GCTCACGATG GTGGCCGCTA 3180 

CTACCGGGCT AGATCTCCAA AGGATCTTCC CCCTGACACA ACTCTGCTAG ACCTGCAAAA 3240 

CAACAAAATA ACCGAAATCA AAGATGGAGA CTTTAAGAAC CTGAAGAACC TTCACGCATT 3300 

GATTCTTGTC AACAATAAAA TTAGCAAAGT TAGTCCTGGA TAACTGCAG 3349 
?S (2) INFORMATION FOR SEQ ID NO: 14: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 57 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



20 



25 



(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 
ATCGAGGGAA GGATTTCAGA ATTCGGATCC TCTAGAGTCG ACCTGCAGGC AAGCTTG 57 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3171 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

5 

CAGCTGTCTT ATGGCTATGA TGAGAAATCA ACCGGAGGAA TTTCCGTGCC TGGCCCCATG 60 
'' 10 GGTCCCTCTG GTCCTCGTGG TCTCCCTGGC CCCCCTGGTG CACCTGGTCC CCAAGGCTTC 120 

CAAGGTCCCC CTGGTGAGCC TGGCGAGCCT GGAGCTTCAG GTCCCATGGG TCCCCGAGGT 180 

15 

CCCCCAGGTC CCCCTGGAAA GAATGGAGAT GATGGGGAAG CTGGAAAACC TGGTCGTCCT 240 
GGTGAGCGTG GGCCTCCTGG GCCTCAGGGT GCTCGAGGAT TGCCCGGAAC AGCTGGCCTC 300 

20 

CCTGGAATGA AGGGACACAG AGGTTTCAGT GGTTTGGATG GTGCCAAGGG AGATGCTGGT 360 
25 CCTGCTGGTC CTAAGGGTGA GCCTGGCAGC CCTGGTGAAA ATGGAGCTCC TGGTCAGATG 420 

GGCCCCCGTG GCCTGCCTGG TGAGAGAGGT CGCCCTGGAG CCCCTGGCCC TGCTGGTGCT 480 

30 

CGTGGAAATG ATGGTGCTAC TGGTGCTGCC GGGCCCCCTG GTCCCACCGG CCCCGCTGGT 540 
CCTCCTGGCT TCCCTGGTGC TGTTGGTGCT AAGGGTGAAG CTGGTCCCCA AGGGCCCCGA 600 

35 

GGCTCTGAAG GTCCCCAGGG TGTGCGTGGT GAGCCTGGCC CCCCTGGCCC TGCTGGTGCT 660 
40 GCTGGCCCTG CTGGAAACCC TGGTGCTGAT GGACAGCCTG GTGCTAAAGG TGCCAATGGT 720 

GCTCCTGGTA TTGCTGGTGC TCCTGGCTTC CCTGGTGCCC GAGGCCCCTC TGGACCCCAG 780 

45 

GGCCCCGGCG GCCCTCCTGG TCCCAAGGGT AACAGCGGTG AACCTGGTGC TCCTGGCAGC 840 
AAAGGAGACA CTGGTGCTAA GGGAGAGCCT GGCCCTGTTG GTGTTCAAGG ACCCCCTGGC 900 

50 
55 
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CCTGCTGGAG AGGAAGGAAA GCGAGGAGCT CGAGGTGAAC CCGGACCCAC TGGCCTGCCC 960 

5 

GGACCCCCTG GCGAGCGTGG TGGACCTGGT AGCCGTGGTT TCCCTGGCGC AGATGGTGTT 1020 

10 

GCTGGTCCCA AGGGTCCCGC TGGTGAACGT GGTTCTCCTG GCCCCGCTGG CCCCAAAGGA 1080 

TCTCCTGGTG AAGCTGGTCG TCCCGGTGAA GCTGGTCTGC CTGGTGCCAA GGGTCTGACT 1140 

15 

GGAAGCCCTG GCAGCCCTGG TCCTGATGGC AAAACTGGCC CCCCTGGTCC CGCCGGTCAA 1200 

20 GATGGTCGCC CCGGACCCCC AGGCCCACCT GGTGCCCGTG GTCAGGCTGG TGTGATGGGA 1260 

TTCCCTGGAC CTAAAGGTGC TGCTGGAGAG CCCGGCAAGG CTGGAGAGCG AGGTGTTCCC 1320 

25 

GGACCCCCTG GCGCTGTCGG TCCTGCTGGC AAAGATGGAG AGGCTGGAGC TCAGGGACCC 1380 

CCTGGCCCTG CTGGTCCCGC TGGCGAGAGA GGTGAACAAG GCCCTGCTGG CTCCCCCGGA 1440 

30 

TTCCAGGGTC TCCCTGGTCC TGCTGGTCCT CCAGGTGAAG CAGGCAAACC TGGTGAACAG 1500 

35 GGTGTTCCTG GAGACCTTGG CGCCCCTGGC CCCTCTGGAG CAAGAGGCGA GAGAGGTTTC 1560 

CCTGGCGAGC GTGGTGTGCA AGGTCCCCCT GGTCCTGCTG GACCCCGAGG GGCCAACGGT 1620 

40 

GCTCCCGGCA ACGATGGTGC TAAGGGTGAT GCTGGTGCCC CTGGAGCTCC CGGTAGCCAG 1680 

GGCGCCCCTG GCCTTCAGGG AATGCCTGGT GAACGTGGTG CAGCTGGTCT TCCAGGGCCT 1740 

45 

AAGGGTGACA GAGGTGATGC TGGTCCCAAA GGTGCTGATG GCTCTCCTGG CAAAGATGGC 1800 

50 GTCCGTGGTC TGACCGGCCC CATTGGTCCT CCTGGCCCTG CTGGTGCCCC TGGTGACAAG 1860 

GGTGAAAGTG GTCCCAGCGG CCCTGCTGGT CCCACTGGAG CTCGTGGTGC CCCCGGAGAC 1920 

55 
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CGTGGTGAGC CTGGTCCCCC CGGCCCTGCT GGCTTTGCTG GCCCCCCTGG TGCTGACGGC 1980 

5 

CAACCTGGTG CTAAAGGCGA ACCTGGTGAT GCTGGTGCCA AAGGCGATGC TGGTCCCCCT 2040 

10 GGGCCTGCCG GACCCGCTGG ACCCCCTGGC CCCATTGGTA ATGTTGGTGC TCCTGGAGCC 2100 

AAAGGTGCTC GCGGCAGCGC TGGTCCCCCT GGTGCTACTG GTTTCCCTGG TGCTGCTGGC 2160 

15 

CGAGTCGGTC CTCCTGGCCC CTCTGGAAAT GCTGGACCCC CTGGCCCTCC TGGTCCTGCT 2220 

GGCAAAGAAG GCGGCAAAGG TCCCCGTGGT GAGACTGGCC CTGCTGGACG TCCTGGTGAA 2280 

20 

GTTGGTCCCC CTGGTCCCCC TGGCCCTGCT GGCGAGAAAG GATCCCCTGG TGCTGATGGT 2040 

2 5 CCTGCTGGTG CTCCTGGTAC TCCCGGGCCT CAAGGTATTG CTGGACAGCG TGGTGTGGTC 2400 

GGCCTGCCTG GTCAGAGAGG AGAGAGAGGC TTCCCTGGTC TTCCTGGCCC CTCTGGTGAA 2460 

30 

CCTGGCAAAC AAGGTCCCTC TGGAGCAAGT GGTGAACGTG GTCCCCCCGG TCCCATGGGC 2520 

CCCCCTGGAT TGGCTGGACC CCCTGGTGAA TCTGGACGTG AGGGGGCTCC TGCTGCCGAA 2580 

35 

GGTTCCCCTG GACGAGACGG TTCTCCTGGC GCCAAGGGTG ACCGTGGTGA GACCGGCCCC 2640 

40 GCTGGACCCC CTGGTGCTCC TGGTGCTCCT GGTGCCCCTG GCCCCGTTGG CCCTGCTGGC 2700 

AAGAGTGGTG ATCGTGGTGA GACTGGTCCT GCTGGTCCCG CCGGTCCCGT CGGCCCCGCT 2760 

45 

GGCGCCCGTG GCCCCGCCGG ACCCCAAGGC CCCCGTGGTG ACAAGGGTGA GACAGGCGAA 2 820 

CAGGGCGACA GAGGCATAAA GGGTCACCGT GGCTTCTCTG GCCTCCAGGG TCCCCCTGGC 2880 

50 

CCTCCTGGCT CTCCTGGTGA ACAAGGTCCC TCTGGAGCCT CTGGTCCTGC TGGTCCCCGA 2940 

55 
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GGTCCCCCTG GCTCTGCTGG TGCTCCTGGC AAAGATGGAC TCAACGGTCT CCCTGGCCCC 3000 

ATTGGGCCCC CTGGTCCTCG CGGTCGCACT GGTGATGCTG GTCCTGTTGG TCCCCCCGGC 3060 

CCTCCTGGAC CTCCTGGTCC CCCTGGTCCT CCCAGCGCTG GTTTCGACTT CAGCTTCCTC 3120 

CCCCAGCCAC CTCAAGAGAA GGCTCACGAT GGTGGCCGCT ACTACCGGGC T 317 x 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1057 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly He Ser Val 
1 5 10 15 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 

Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 
35 40 45 

Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 
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Pro Gly Lys Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
65 70 75 80 

Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly 
85 90 95 

Thr Ala Gly Leu Pro Gly Met Lys Gly His Arg Gly Phe Ser Gly Leu 
100 105 110 

Asp Gly Ala Lys Gly Asp Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro 
115 120 125 

Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin Met Gly Pro Arg Gly 
130 135 140 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala 
145 150 155 160 

Arg Gly Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr 
165 170 175 

Gly Pro Ala Gly Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly 
180 185 190 

Glu Ala Gly Pro Gin Gly Pro Arg Gly Ser Glu Gly Pro Gin Gly Val 
195 200 205 

Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala 
210 215 220 

Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Ala Asn Gly 
225 230 235 240 
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Ala Pro Gly He Ala Gly Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro 
245 250 255 



10 



Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly Aen Ser 
260 265 270 



15 



Gly Glu Pro Gly Ala Pro Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly 
275 280 285 



20 



Glu Pro Gly Pro Val Gly Val Gin Gly Pro Pro Gly Pro Ala Gly Glu 
290 295 300 



25 



30 



35 



40 



Glu Gly Lys Arg Gly Ala Arg Gly Glu Pro Gly Pro Thr Gly Leu Pro 
305 310 315 320 

Gly Pro Pro Gly Glu Arg Gly Gly Pro Gly Ser Arg Gly Phe Pro Gly 
325 330 335 

Ala Asp Gly Val Ala Gly Pro Lys Gly Pro Ala Gly Glu Arg Gly Ser 
340 345 350 

Pro Gly Pro Ala Gly Pro Lys Gly Ser Pro Gly Glu Ala Gly Arg Pro 
355 360 365 

Gly Glu Ala Gly Leu Pro Gly Ala Lys Gly Leu Thr Gly Ser Pro Gly 
370 375 380 



45 



Ser Pro Gly Pro Asp Gly Lys Thr Gly Pro Pro Gly Pro Ala Gly Gin 
385 390 395 400 



50 



Asp Gly Arg Pro Gly Pro Pro Gly Pro Pro Gly Ala Arg Gly Gin Ala 
405 410 415 



55 
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Gly Val Met Gly Phe Pro Gly Pro Lys Gly Ala Ala Gly Glu Pro Gly 
420 425 430 



10 



Lys Ala Gly Glu Arg Gly Val Pro Gly Pro Pro Gly Ala Val Gly Pro 
435 440 445 



15 



Ala Gly Lys Asp Gly Glu Ala Gly Ala Gin Gly Pro Pro Gly Pro Ala 
450 455 460 



20 



Gly Pro Ala Gly Glu Arg Gly Glu Gin Gly Pro Ala Gly Ser Pro Gly 
465 470 475 480 



25 



35 



40 



Phe Gin Gly Leu Pro Gly Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys 
485 490 495 

Pro Gly Glu Gin Gly Val Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser 
500 505 510 

Gly Ala Arg Gly Glu Arg Gly Phe Pro Gly Glu Arg Gly Val Gin Gly 
515 520 525 

Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn 
530 535 540 

Asp Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly Ser Gin 
545 550 555 560 



45 



Gly Ala Pro Gly Leu Gin Gly Met Pro Gly Glu Arg Gly Ala Ala Gly 
565 570 575 



50 



Leu Pro Gly Pro Lys Gly Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala 
580 585 590 
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Asp Gly Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Thr Gly Pro lie 
595 600 605 

Gly Pro Pro Gly Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly 
610 615 620 

Pro Ser Gly Pro Ala Gly Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp 
625 630 635 640 

Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro 
645 650 655 

Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly 
660 665 670 

Ala Lys Gly Asp Ala Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Pro 
675 680 685 

Pro Gly Pro lie Gly Asn Val Gly Ala Pro Gly Ala Lys Gly Ala Arg 
690 695 700 

Gly Ser Ala Gly Pro Pro Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly 
705 710 715 720 

Arg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro 
725 730 735 

Pro Gly Pro Ala Gly Lys Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr 
740 745 750 



Gly Pro Ala Gly Arg Pro Gly Glu Val Gly Pro Pro Gly Pro Pro Gly 
755 760 765 
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Pro Ala Gly Glu Lys Gly Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala 
770 775 780 

Pro Gly Thr Pro Gly Pro Gin Gly lie Ala Gly Gin Arg Gly Val Val 
785 790 795 800 

Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly Phe Pro Gly Leu Pro Gly 
805 810 815 

Pro Ser Gly Glu Pro Gly Lys Gin Gly Pro Ser Gly Ala Ser Gly Glu 
820 825 830 

Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro 
835 840 845 

Gly Glu Ser Gly Arg Glu Gly Ala Pro Ala Ala Glu Gly Ser Pro Gly 
850 855 860 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro 
865 870 875 880 

Ala Gly Pro Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Val 
885 890 895 

Gly Pro Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly 
900 905 910 

Pro Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro 
915 ' 920 925 

Gin Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg 
930 935 940 
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Gly lie Lys Gly His Arg Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly 
5 945 950 955 960 

Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro 
1< > 965 970 975 

Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp 
9B0 985 990 



15 



20 



25 



30 



35' 



Gly Leu Asn Gly Leu Pro Gly Pro He Gly Pro Pro Gly Pro Arg Gly 
995 1000 1005 

Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro 
1010 1015 1020 

Pro Gly Pro Pro Gly Pro Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu 
1025 1030 1035 1040 

Pro Gin Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg 
1045 1050 1055 



Ala 



40 

(2) INFORMATION FOR SEQ ID NO: 17: 
(i) SEQUENCE CHARACTERISTICS: 

45 

(A) LENGTH: 46 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
so (D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

55 
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<ix) FEATURE: 

(A) NAME/KEY: Region 

(B) LOCATION: 1..2 

(D) OTHER INFORMATION: /note= "Amino acid sequence for 
glutathione S-transferase" 

( ix ) FEATURE : 

(A) NAME/KEY: Region 

(B) LOCATION: 19.. 20 

(D) OTHER INFORMATION: /note= "338 repeats of the 
following triplet Gly-X-y wherein about 35% of the X and Y 
positions are occupied by proline and 4-hydroxyproline . " 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Xaa Met Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly He 
15 10 15 

Ser Val Pro Xaa Ser Ala Gly Phe Asp Phe Ser Phe Leu Pro Gin Pro 
20 25 30 

Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg Ala 
35 40 45 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
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(ix) FEATURE: 

(A) NAME/KEY; Region 

(B) LOCATION: 1..2 

(D) OTHER INFORMATION: /note* "Amino acid sequence for 
glutathione S-transf erase . " 

(ix) FEATURE: 

(A) NAME/KEY: Region 
(B> LOCATION: 4. .5 

(D) OTHER INFORMATION: /note» "338 repeats of the 
following triplet Gly-X-Y wherein about 35% of the X and Y 
positions are occupied by proline and 4-hydroxyproline. " 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Xaa Met Gly Xaa Tyr Ser Ala Gly Phe Asp Phe Ser Phe Leu Pro Gin 
15 10 15 

Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg Ala 
20 25 30 

(2) INFORMATION FOR SEQ ID NO:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3171 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

1 5 

CAGCTGAGCT ATGGCTATGA TGAAAAAAGC ACCGGCGGCA TCAGCGTGCC GGGCCCGATG 60 
'10 GGTCCGAGCG GCCCTCGTGG CCTGCCGGGC CCGCCAGGTG CGCCCGGTCC GCAGGGCTTT 120 

CAGGGTCCGC CGGGCGAACC GGGCGAACCT GGTGCGAGCG GCCCGATGGG CCCGCGCGGC 180 
15 CCGCCGGGTC CGCCAGGCAA AAACGGCGAT GATGGCGAAG CGGGCAAACC GGGACGTCCG 240 

GGTGAACGTG GCCCCCCGGG CCCGCAGGGC GCGCGCGGAC TGCCGGGTAC TGCGGGACTG 300 

20 

CCGGGCATGA AAGGCCACCG CGGTTTCTCT GGTCTGGATG GTGCGAAAGG TGATGCGGGT 360 
CCGGCGGGTC CGAAAGGTGA GCCGGGCAGC CCGGGCGAAA ACGGCGCGCC GGGTCAGATG 420 

25 

GGCCCGCGTG GCCTGCCTGG TGAACGCGGT CGCCCGGGCG CCCCGGGCCC AGCTGGCGCA 480 
™ CGTGGCAACG ATGGTGCGAC CGGTGCGGCC GGTCCACCGG GCCCGACGGG CCCGGCGGGT 540 

CCCCCGGGCT TTCCGGGTGC GGTGGGTGCG AAAGGCGAAG CAGGTCCGCA GGGGCCGCGC 600 

35' 

GGGAGCGAGG GTCCTCAGGG CGTTCGTGGT GAACCGGGCC CGCCGGGCCC GGCGGGTGCG 660 
GCGGGCCCGG CTGGTAACCC TGGCGCGGAC GGTCAGCCAG GTGCGAAAGG TGCCAACGGC 720 

40 

GCGCCGGGTA TTGCAGGTGC ACCGGGCTTC CCGGGTGCCC GCGGCCCGTC CGGCCCGCAG 780 
45 GGCCCGGGCG GCCCGCCCGG CCCGAAAGGG AACAGCGGTG AACCGGGTGC GCCAGGCAGC 840 

AAAGGCGACA CCGGTGCGAA AGGTGAACCG GGCCCAGTGG GTGTTCAAGG CCCGCCGGGC 900 

50 

CCGGCGGGCG AGGAAGGCAA ACGCGGTGCT CGCGGTGAAC CGGGCCCGAC CGGCCTGCCT 960 

55 



101 



GGCCCGCCGG GAGAACGTGG 
GCGGGCCCGA AAGGTCCGGC 
AGCCCGGGCG AGGCAGGACG 
GGCTCTCCGG GCAGCCCGGG 
GATGGTCGCC CGGGCCCGCC 
TTTCCAGGCC CCAAAGGTGC 
GGTCCGCCGG GCGCTGTCGG 
CCGGGACCAG CGGGTCCGGC 
TTCCAGGGTC TGCCGGGCCC 
GGTGTGCCGG GCGACCTGGG 
CCGGGCGAAC GTGGTGTGCA 
GCGCCGGGCA ACGATGGTGC 
GGCGCCCCGG GGCTGCAAGG 
AAAGGCGACC GCGGTGATGC 
GTTCGTGGTC TGACCGGCCC 
GGTGAAAGCG GTCCGAGCGG 
CGTGGTGAAC CGGGTCCGCC 
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TGGCCCGGGT AGCCGCGGTT 
GGGTGAACGT GGTAGCCCGG 
TCCGGGTGAA GCGGGTCTCC 
TCCGGATGGC AAAACGGGCC 
GGGCCCGCCG GGTGCCCGTG 
GGCGGGTGAA CCGGGCAAAG 
GCCGGCGGGC AAAGATGGCG 
GGGCGAGCGC GGTGAACAGG 
TGCGGGTCCA CCGGGTGAAG 
CGCCCCAGGC CCGAGCGGCG 
GGGCCCGCCC GGCCCGGCTG 
GAAAGGTGAT GCGGGTGCCC 
CATGCCGGGT GAACGTGGTG 
GGGTCCAAAA GGTGCGGATG 
GATCGGCCCG CCGGGCCCGG 
CCCAGCGGGC CCCACTGGTG 
GGGCCCGGCG GGCTTTGCGG 



TTCCGGGCGC GGATGGTGTG 
GCCCGGCGGG CCCAAAAGGC 
CGGGCGCCAA AGGTCTGACC 
CGCCTGGTCC GGCCGGCCAG 
GTCAGGCGGG TGTCATGGGC 
CGGGCGAACG CGGTGTCCCG 
AAGCGGGCGC GCAAGGCCCG 
GCCCGGCAGG CAGCCCGGGT 
CGGGCAAACC GGGGGAACAA 
CGCGCGGCGA ACGCGGTTTC 
GTCCGCGCGG CGCCAACGGC 
CAGGTGCGCC GGG CAGCCAG 
CCGCGGGTCT ACCGGGTCCG 
GCTCCCCTGG CAAAGATGGC 
CAGGTGCGCC GGGTGACAAA 
CGCGTGGTGC CCCGGGCGAC 
GCCCGCCAGG CGCTGACGGC 
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CAGCCGGGTG CGAAAGGCGA ACCGGGGGAT GCGGGTGCTA AAGGCGACGC GGGTCCGCCG 2040 

GGCCCTGCCG GCCCGGCGGG CCCGCCAGGC CCGATTGGCA ACGTGGGTGC GCCGGGTGCC 2100 

10 AAAGGTGCGC GCGGCAGCGC TGGTCCGCCG GGCGCGACCG GTTTCCCCGG TGCGGCGGGG 2160 

CGCGTGGGTC CGCCAGGCCC GAGCGGTAAC GCGGGTCCGC CAGGTCCGCC TGGCCCGGCT 2220 

GGCAAAGAGG GCGGCAAAGG TCCGCGTGGT GAAACCGGCC CTGCGGGACG TCCAGGTGAA 2280 

GTGGGTCCGC CGGGCCCGCC GGGCCCGGCG GGCGAAAAAG GTAGCCCGGG TGCGGATGGT 2340 

CCCGCCGGTG CGCCAGGCAC GCCGGGTCCG CAAGGTATCG CTGGCCAGCG TGGTGTCGTC 2400 

GGGCTGCCGG GTCAGCGCGG CGAACGCGGC TTTCCGGGTC TGCCGGGCCC GAGCGGTGAG 2460 

CCGGGCAAAC AGGGTCCATC TGGCGCGAGC GGTGAACGTG GCCCGGCGGG TCCCATGGGC 2520 

CCGCCGGGTC TGGCGGGCCC TCCGGGTGAA AGCGGTCGTG AAGGCGCGCC GGGTGCCGAA 2580 

GGCAGCCCAG GCCGCGACGG TAGCCCGGGG GCCAAAGGGG ATCGTGGTGA AACCGGCCCG 2640 

GCGGGCCCCC CGGGTGCACC GGGCGCGCCG GGTGC CCCAG GCCCGGTGGG CCCGGCGGGC 2700 

AAAAGCGGTG ATCGTGGTGA GACCGGTCCG GCGGGCCCGG CCGGTCCGGT GGGCCCAGCG 2760 

GGCGCCCGTG GCCCGGCCGG TCCGCAGGGC CCGCGGGGTG ACAAAGGTGA AACGGGCGAA 2820 

CAGGGCGACC GTGGCATTAA AGGCCACCGT GGCTTCAGCG GCCTGCAGGG TCCACCGGGC 2880 

50 CCGCCGGGCA GTCCGGGTGA ACAGGGTCCG TCCGGAGCCA GCGGGCCGGC GGGCCCACGC 2940 

GGTCCGCCGG GCAGCGCGGG CGCGCCGGGC AAAGACGGTC TGAACGGTCT GCCGGGCCCG 3000 

55 



15 



20 



25 



30 



35 



40 



45 
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ATCGGCCCGC CGGGCCCACG CGGCCGCACC GGTGATGCGG GTCCGGTGGG TCCCCCGGGC 3 060 

CCGCCGGGCC CGCCAGGCCC GCCGGGACCG CCGAGCGCGG GTTTCGACTT CAGCTTCCTG 3120 

CCGCAGCCGC CGCAGGAGAA AGCGCACGAC GGCGGTCGCT ACTACCGTGC G 3171 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1057 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

25 (ii) MOLECULE TYPE: peptide 



30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

35 Qln Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val 

15 10 15 



10 



15 



20 



40 



45' 



50 



55 



Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 

Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 
35 40 45 

Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 
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Pro Gly Lys Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
65 70 75 80 

Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly Ala Arg Gly Leu Pro Gly 
85 90 95 

Thr Ala Gly Leu Pro Gly Met Lya Gly His Arg Gly Phe Ser Gly Leu 
100 105 110 

Asp Gly Ala Lys Gly Asp Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro 
115 120 125 

Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin Met Gly Pro Arg Gly 
130 135 140 

Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala 
145 150 155 160 

Arg Gly Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr 
165 170 175 

Gly Pro Ala Gly Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly 
180 185 190 

Glu Ala Gly Pro Gin Gly Pro Arg Gly Ser Glu Gly Pro Gin Gly Val 
195 200 205 

Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala 
210 215 220 

Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly Ala" Lys Gly Ala Asn Gly 
225 230 235 240 
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Ala Pro Gly He Ala Gly Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro 
245 250 255 

Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Gly Asn Ser 
260 265 270 

Gly Glu Pro Gly Ala Pro Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly 
275 280 285 

Glu Pro Gly Pro Val Gly Val Gin Gly Pro Pro Gly Pro Ala Gly Glu 
290 295 300 

Glu Gly Lys Arg Gly Ala Arg Gly Glu Pro Gly Pro Thr Gly Leu Pro 
305 310 315 320 

Gly Pro Pro Gly Glu Arg Gly Gly Pro Gly Ser Arg Gly Phe Pro Gly 
325 330 335 

Ala Asp Gly Val Ala Gly Pro Lys Gly Pro Ala Gly Glu Arg Gly Ser 
340 345 350 

Pro Gly Pro Ala Gly Pro Lys Gly Ser Pro Gly Glu Ala Gly Arg Pro 
355 360 365 

Gly Glu Ala Gly Leu Pro Gly Ala Lys Gly Leu Thr Gly Ser Pro Gly 
370 375 380 

Ser Pro Gly Pro Asp Gly Lys Thr Gly Pro Pro Gly Pro Ala Gly Gin 
385 390 395 400 



Asp Gly Arg Pro Gly Pro Pro Gly Pro Pro Gly Ala Arg Gly Gin Ala 
405 410 415 
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Gly Val Met Gly Phe Pro Gly Pro Lys Gly Ala Ala Gly Glu Pro Gly 
420 425 430 

Lys Ala Gly Glu Arg Gly Val Pro Gly Pro Pro Gly Ala Val Gly Pro 
435 440 445 

Ala Gly Lys Asp Gly Glu Ala Gly Ala Gin Gly Pro Pro Gly Pro Ala 
450 455 460 

Gly Pro Ala Gly Glu Arg Gly Glu Gin Gly Pro Ala Gly Ser Pro Gly 
465 470 475 480 

Phe Gin Gly Leu Pro Gly Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys 
485 490 495 

Pro Gly Glu Gin Gly Val Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser 
500 505 510 

Gly Ala Arg Gly Glu Arg Gly Phe Pro Gly Glu Arg Gly Val Gin Gly 
515 520 525 

Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn 
530 535 540 

Asp Gly Ala Lys Gly Asp Ala Gly Ala Pro Gly Ala Pro Gly Ser Gin 
545 550 555 560 

Gly Ala Pro Gly Leu Gin Gly Met Pro Gly Glu Arg Gly Ala Ala Gly 
565 570 575 



Leu Pro Gly Pro Lys Gly Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala 
580 585 590 
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Asp Gly Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Thr Gly Pro lie 
595 600 605 



10 



Gly Pro Pro Gly Pro Ala Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly 
610 615 620 



15 



Pro Ser Gly Pro Ala Gly Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp 
625 630 635 640 



20 



Arg Gly Glu Pro Gly Pro Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro 
645 650 655 



25 



Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly 
660 665 670 



30 



Ala Lys Gly Asp Ala Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Pro 
675 680 685 



35 



40 



45 



Pro Gly Pro He Gly Asn Val Gly Ala Pro Gly Ala Lys Gly Ala Arg 
690 695 700 

Gly Ser Ala Gly Pro Pro Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly 
705 710 715 720 

Arg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro 
725 730 735 

Pro Gly Pro Ala Gly Lys Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr 
740 745 750 



50 



Gly Pro Ala Gly Arg Pro Gly Glu Val Gly Pro Pro Gly Pro Pro Gly 
755 760 765 



55 
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Pro Ala Gly Glu Lys Gly Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala 
770 775 780 

Pro Gly Thr Pro Gly Pro Gin Gly He Ala Gly Gin Arg Gly Val Val 
785 790 795 800 

Gly Leu Pro Gly Gin Arg Gly Glu Arg Gly Phe Pro Gly Leu Pro Gly 
805 810 815 

Pro Ser Gly Glu Pro Gly Lye Gin Gly Pro Ser Gly Ala Ser Gly Glu 
820 825 830 

Arg Gly Pro Pro Gly Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro 
835 840 845 

Gly Glu Ser Gly Arg Glu Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly 
850 855 860 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro 
865 870 875 880 

Ala Gly Pro Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Val 
885 890 895 

Gly Pro Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly 
900 905 910 

Pro Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro 
915 920 925 

Gin Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg 
930 935 940 
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Gly lie Lys Gly His Arg Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly 
945 950 955 960 



10 



30 



35 



Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro 
965 970 975 



Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp 
15 980 965 990 

Gly Leu Asn Gly Leu Pro Gly Pro lie Gly Pro Pro Gly Pro Arg Gly 
20 995 1000 1005 

Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro 
1010 1015 1020 

25 

Pro Gly Pro Pro Gly t*ro Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu 
1025 1030 1035 1040 



Pro Gin Pro Pro Gin Glu Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg 
1045 1050 1055 



Ala 



(2) INFORMATION FOR SEQ ID NO: 21: 
(i) SEQUENCE CHARACTERISTICS: 

45 

(A) LENGTH: 79 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
50 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

55 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:£l: 
GGAATTCATG CAGCTGAGCT ATGGCTATGA TGAAAAAAGC ACCGGCGGCA TCAGCGTGCC 
GGGCCCGATG GGTCCGAGC 
(2) INFORMATION FOR SEQ ID NO: 22: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 75 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
GGCCCGGGCT ACCCAGGCTC GCCGGGCGCA CCGGACGGCC CGGGCGGTCC AGCGGGGCCA 
GCATTATTCG AACCC 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 81 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
GGAATTCCGG GTCCGCAGGG CTTTCAGGGT CCGCCGGGCG AACCTGGTGC GAGCGGCCCG 60 
ATGGGCCCGC GCGGCCCGCC C 81 

V 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 87 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
TACCCGGGCG CGCCGGGCGG CCCAGGCGGT CCGTTTTTGC CGCTACTACC GTTCGCCCGT 60 
40 TTGGCCCTGC AGGCATTATT CGAACCC 87 

(2) INFORMATION FOR SEQ ID NO: 25: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 111 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
CAGCTGAGCT ATGGCTATGA TGAAAAAAGC ACCGGCGGCA TCAGCGTGCC GGGCCCGATG 
GGTCCGAGCG GCCCTCGTGG CCTGCCGGGC CCGCCAGGTG CGCCCGGTCC G 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val 
15 10 15 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 

Gly Ala Pro Gly Pro 
35 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 240 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



1Q (ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

15 

CAGCTGAGCT ATGGCTATGA TGAAAAAAGC ACCGGCGGCA TCAGCGTGCC GGGCCCGATG 60 



GGTCCGAGCG GCCCTCGTGG CCTGCCGGGC CCGCCAGGTG CGCCCGGTCC GCAGGGCTTT 120 

20 

CAGGGTCCGC CGGGCGAACC GGGCGAACCT GGTGCGAGCG GCCCGATGGG CCCGCGCGGC 180 

25 CCGCCGGGTC CGCCAGGCAA AAACGGCGAT GATGGCGAAG CGGGCAAACC GGGACGTCCG 240 



30 



35 



45 



50 



55 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly He Ser Val 
15 10 15 

Pro Gly Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro 
20 25 30 
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Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly 
35 40 45 

Glu Pro Gly Ala Ser Gly Pro Met Gly Pro Arg Gly Pro Pro Gly Pro 
50 55 60 



Pro Gly Lye Asn Gly Asp Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro 
15 65 70 75 80 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3120 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
CAGTATGATG GAAAAGGAGT TGGACTTGGC CCTGGACCAA TGGGCTTAAT GGGACCTAGA 60 

40 

GGCCCACCTG GTGCAGCTGG AGCCCCAGGC CCTCAAGGTT TCCAAGGACC TGCTGGTGAG 120 
CCTGGTGAAC CTGGTCAAAC TGGTCCTGCA GGTGCTCGTG GTCCAGCTGG CCCTCCTGGC 180 

45 

AAGGCTGGTG AAGATGGTCA CCCTGGAAAA CCCGGACGAC CTGGTGAGAG AGGAGTTGTT 240 
50 GGACCACAGG GTGCTCGTGG TTTCCCTGGA ACTCCTGGAC TTCCTGGCTT CAAAGGCATT 300 

AGGGGACACA ATGGTCTGGA TGGATTGAAG GGACAGCCCG GTGCTCCTGG TGTGAAGGGT 360 

55 
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GAACCTGGTG CCCCTGGTGA AAATGGAACT CCAGGTCAAA CAGGAGCCCG TGGGCTTCCT 420 

GGTGAGAGAG GACGTGTTGG TGCCCCTGGC CCAGCTGGTG CCCGTGGCAG TGATGGAAGT 460 

GTGGGTCCCG TGGGTCCTGC TGGTCCCATT GGGTCTGCTG GCCCTCCAGG CTTCCCAGGT 540 

GCCCCTGGCC CCAAGGGTGA AATTGGAGCT GTTGGTAACG CTGGTCCTGC TGGTCCCGCC 600 

GGTCCCCGTG GTGAAGTGGG TCTTCCAGGC CTCTCCGGCC CCGTTGGACC TCCTGGTAAT 660 

CCTGGAGCAA ACGGCCTTAC TGGTGCCAAG GGTGCTGCTG GCCTTCCCGG CGTTGCTGGG 720 

GCTCCCGGCC TCCCTGGACC CCGCGGTATT CCTGGCCCTG TTGGTGCTGC CGGTGCTACT 780 

GGTGCCAGAG GACTTGTTGG TGAGCCTGGT CCAGCTGGCT CCAAAGGAGA GAGCGGTAAC 840 

AAGGGTGAGC CCGGCTCTGC TGGGCCCCAA GGTCCTCCTG GTCCCAGTGG TGAAGAAGGA 900 

AAGAGAGGCC CTAATGGGGA AGCTGGATCT GCCGGCCCTC CAGGACCTCC TGGGCTGAGA 960 

35 GGTAGTCCTG GTTCTCGTGG TCTTCCTGGA GCTGATGGCA GAGCTGGCGT CATGGGCCCT 1020 

CCTGGTAGTC GTGGTGCAAG TGGCCCTGCT GGAGTCCGAG GACCTAATGG AGATGCTGGT X080 

CGCCCTGGGG AGCCTGGTCT CATGGGACCC AGAGGTCTTC CTGGTTCCCC TGGAAATATC 1140 

GGCCCCGCTG GAAAAGAAGG TCCTGTCGGC CTCCCTGGCA TCGACGGCAG GCCTGGCCCA 1200 

ATTGGCCCAG CTGGAGCAAG AGGAGAGCCT GGCAACATTG GATTCCCTGG ACCCAAAGGC 1260 

50 CCCACTGGTG ATCCTGGCAA AAACGGTGAT AAAGGTCATG CTGGTCTTGC TGGTGCTCGG 1320 

GGTGCTCCAG GTCCTGATGG AAACAATGGT GCTCAGGGAC CTCCTGGACC ACAGGGTGTT 1380 

55 
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5 CAAGGTGGAA AAGGTGAACA GGGTCCCGCT 

CCCTCAGGTC CCGCTGGTGA AGTTGGCAAA 

10 

GGTCTCCCTG GTCCTGCTGG TCCAAGAGGG 
GCCGGTCCTA CTGGTCCTAT TGGAAGCCGA 

15 

AACAAGGGTG AACCTGGTGT GGTTGGTGCT 
20 GGACTCCCAG GAGAGAGGGG TGCTGCTGGC 

CCTGGTCTCA GAGGTGAAAT TGGTAACCCT 

25 

GCTGTAGGTG CCCC7JGTCC TGCTGGAGCC 
30 GGTCCTGCTG GTCCTGCTGG TCCTCGGGGA 

GCTGGCCCCA ACGGATTTGC TGGTCCGGCT 

35 

GAAAGAGGAG CCAAAGGGCC TAAGGGTGAA 
GGAGCTGCTG GCCCAGCTGG TCCAAATGGT 

40 

GGAGGCCCCC CTGGTATGAC TGGTTTCCCT 
45 CCCTCTGGTA TTTCTGGCCC TCCTGGTCCC 

GGTCCTCGTG GTGACCAAGG TCCAGTTGGC 

50 

CCTGGCTTCG CTGGTGAGAA GGGTCCCTCT 
ACTCCAGGTC CTCAGGGTCT TCTTGGTGCT 

55 



GGTCCTCCAG GCTTCCAGGG TCTGCCTGGC 1440 

CCAGGAGAAA GGGGTCTCCA TGGTGAGTTT 15 00 

GAACGCGGTC CCCCAGGTGA GAGTGGTGCT 1560 

GGTCCTTCTG GACCCCCAGG GCCTGATGGA 1620 

GTGGGCACTG CTGGTCCATC TGGTCCTAGT 1680 

ATACCTGGAG GCAAGGGAGA AAAGGGTGAA 1740 

GGCAGAGATG GTGCTCGTGG TGCTCATGGT 1600 

ACAGGTGACC GGGGCGAAGC TGGGGCTGCT 1860 

AGCCCTGGTG AACGTGGCGA GGTCGGTCCT 1920 

GGTGCTGCTG GTCAACCGGG TGCTAAAGGA 1980 

AACGGTGTTG TTGGTCCCAC AGGCCCCGTT 2040 

CCCCCCGGTC CTGCTGGAAG TCGTGGTGAT 2100 

GGTGCTGCTG GACGGACTGG TCCCCCAGGA 2160 

CCTGGTCCTG CTGGGAAAGA AGGGCTTCGT 2220 

CGAACTGGAG AAGTAGGTGC AGTTGGTCCC 2280 

GGAGAGGCTG GTACTGCTGG ACCTCCTGGC 234 0 

CCTGGTATTC TGGGTCTCCC TGGCTCGAGA 2400 
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GGTGAACGTG GTCTACCTGG TGTTGCTGGT GCTGTGGGTG AACCTGGTCC TCTTGGCATT 2460 

GCCGGCCCTC CTGGGGCCCG TGGTCCTCCT GGTGCTGTQG GTAGTCCTGG AGTCAACGGT 2520 

GCTCCTGGTG AAGCTGGTCG TGATGGCAAC CCTGGGAACG ATGGTCCCCC AGGTCGCGAT 2S80 

GGTCAACCCG GACACAAGGG AGAGCGCGGT TACCCTGGCA ATATTGGTCC CGTTGGTGCT 2640 

GCAGGTGCAC CTGGTCCTCA TGGCCCCGTG GGTCCTGCTG GCAAACATGG AAACCGTGGT 2700 

GAAACTGGTC CTTCTGGTCC TGTTGGTCCT GCTGGTGCTG TTGGCCCAAG AGGTCCTAGT 2760 

GGCCCACAAG GCATTCGTGG CGATAAGGGA GAGCCCGGTG AAAAGGGGCC CAGAGGTCTT 2820 

CCTGGCTTAA AGGGACACAA TGGATTGCAA GGTCTGCCTG GTATCGCTGG TCACCATGGT 2880 

GATCAAGGTG CTCCTGGCTC CGTGGGTCCT GCTGGTCCTA GGGGCCCTGC TGGTCCTTCT 2940 

GGCCCTGCTG GAAAAGATGG TCGCACTGGA CATCCTGGTA CGGTTGGACC TGCTGGCATT 3000 

CGAGGCCCTC AGGGTCACCA AGGCCCTGCT GGCCCCCCTG GTCCCCCTGG CCCTCCTGGA 3060 

CCTCCAGGTG TAAGCGGTGG TGGTTATGAC TTTGGTTACG ATGGAGACTT CTACAGGGCT 3120 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1040 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Gin Tyr Asp Gly Lys Gly Val Gly Leu Gly Pro Gly Pro Met Gly Leu 
1 5 10 15 

Met Gly Pro Arg Gly Pro Pro Gly Ala Ala Gly Ala Pro Gly Pro Gin 
20 25 30 

Gly Phe Gin Gly Pro Ala Gly Glu Pro Gly Glu Pro Gly Gin Thr Gly 
35 40 45 

Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro Pro Gly Lys Ala Gly Glu 
50 55 60 

Asp Gly His Pro Gly Lys Pro Gly Arg Pro Gly Glu Arg Gly Val Val 
65 70 75 80 

Gly Pro Gin Gly Ala Arg Gly Phe Pro Gly Thr Pro Gly Leu Pro Gly 
85 90 95 

Phe Lys Gly lie Arg Gly His Asn Gly Leu Asp Gly Leu Lys Gly Gin 
100 105 110 

Pro Gly Ala Pro Gly Val Lys Gly Glu Pro Gly Ala Pro Gly Glu Asn 
115 120 125 

Gly Thr Pro Gly Gin Thr Gly Ala Arg Gly Leu Pro Gly Glu Arg Gly 
130 135 140 

Arg Val Gly Ala Pro Gly Pro Ala Gly Ala Arg Gly Ser Asp Gly Ser 
145 150 155 160 
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Val Gly Pro Val Gly Pro Ala Gly Pro lie Gly Ser Ala Gly Pro Pro 
165 170 175 



10 



Gly Phe Pro Gly Ala Pro Gly Pro Lys Gly Glu lie Gly Ala Val Gly 
180 185 190 



75 



Asn Ala Gly Pro Ala Gly Pro Ala Gly Pro Arg Gly Glu Val Gly Leu 
195 200 205 



20 



25 



30 



Pro Gly Leu Ser Gly Pro Val Gly Pro Pro Gly Asn Pro Gly Ala Asn 
210 215 220 

Gly Leu Thr Gly Ala Lys Gly Ala Ala Gly Leu Pro Gly Val Ala Gly 
225 230 235 240 

Ala Pro Gly Leu Pro Gly Pro Arg Gly lie Pro Gly Pro Val Gly Ala 
245 250 255 

Ala Gly Ala Thr Gly Ala Arg Gly Leu Val Gly Glu Pro Gly Pro Ala 
260 265 270 



35 



Gly Ser Lys Gly Glu Ser Gly Asn Lys Gly Glu Pro Gly Ser Ala Gly 
275 280 285 



40 



Pro Gin Gly Pro Pro Gly Pro Ser Gly Glu Glu Gly Lys Arg Gly Pro 
290 295 300 



45 



Asn Gly Glu Ala Gly Ser Ala Gly Pro Pro Gly Pro Pro Gly Leu Arg 
305 310 315 320 



50 



Gly Ser Pro Gly Ser Arg Gly Leu Pro Gly Ala Asp Gly Arg Ala Gly 
325 330 335 
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Val Met Gly Pro Pro Gly Ser Arg Gly Ala Ser Gly Pro Ala Gly Val 
340 345 350 



10 



Arg Gly Pro Asn Gly Asp Ala Gly Arg Pro Gly Glu Pro Gly Leu Met 
355 360 365 



15 



Gly Pro Arg Gly Leu Pro Gly Ser Pro Gly Asn lie Gly Pro Ala Gly 
370 375 380 



20 



30 



35 



40 



Lys Glu Gly Pro Val Gly Leu Pro Gly He Asp Gly Arg Pro Gly Pro 
385 390 395 400 

He Gly Pro Ala Gly Ala Arg Gly Glu Pro Gly Asn He Gly Phe Pro 
405 410 415 

Gly Pro Lys Gly Pro Thr Gly Asp Pro Gly Lys Asn Gly Asp Lys Gly 
420 425 430 

His Ala Gly Leu Ala Gly Ala Arg Gly Ala Pro Gly Pro Asp Gly Asn 
435 440 445 

Asn Gly Ala Gin Gly Pro Pro Gly Pro Gin Gly Val Gin Gly Gly Lys 
450 455 460 

Gly Glu Gin Gly Pro Ala Gly Pro Pro Gly Phe Gin Gly Leu Pro Gly 
465 470 475 480 



45 



Pro Ser Gly Pro Ala Gly Glu Val Gly Lys Pro Gly Glu Arg Gly Leu 
485 490 495 



50 



His Gly Glu Phe Gly Leu Pro Gly Pro Ala Gly Pro Arg Gly Glu Arg 
500 505 510 
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Gly Pro Pro Gly Glu Ser Gly Ala Ala Gly Pro Thr Gly Pro lie Gly 
515 520 525 



10 



Ser Arg Gly Pro Ser Gly Pro Pro Gly Pro Asp Gly Asn Lys Gly Glu 
530 535 540 



15 



Pro Gly Val Val Gly Ala Val Gly Thr Ala Gly Pro Ser Gly Pro Ser 
545 550 555 560 



20 



25 



30 



Gly Leu Pro Gly Glu Arg Gly Ala Ala Gly lie Pro Gly Gly Lys Gly 
565 570 575 

Glu Lys Gly Glu Pro Gly Leu Arg Gly Glu lie Gly Asn Pro Gly Arg 
580 585 590 

Asp Gly Ala Arg Gly Ala His Gly Ala Val Gly Ala Pro Gly Pro Ala 
595 600 605 

Gly Ala Thr Gly Asp Arg Gly Glu Ala Gly Ala Ala Gly Pro Ala Gly 
610 615 620 



35 



Pro Ala Gly Pro Arg Gly Ser Pro Gly Glu Arg Gly Glu Val Gly Pro 
625 630 635 640 



40 



Ala Gly Pro Asn Gly Phe Ala Gly Pro Ala Gly Ala Ala Gly Gin Pro 
645 650 655 



45 



Gly Ala Lys Gly Glu Arg Gly Ala Lys Gly Pro Lys Gly Glu Asn Gly 
660 665 670 



50 



Val Val Gly Pro Thr Gly Pro Val Gly Ala Ala Gly Pro Ala Gly Pro 
675 680 685 
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Asn Gly Pro Pro Gly Pro Ala Gly Ser Arg Gly Asp Gly Gly Pro Pro 
690 695 700 

Gly Met Thr Gly Phe Pro Gly Ala Ala Gly Arg Thr Gly Pro Pro Gly 
705 710 715 720 

Pro Ser Gly lie Ser Gly Pro Pro Gly Pro Pro Gly Pro Ala Gly Lys 
725 730 735 

Glu Gly Leu Arg Gly Pro Arg Gly Asp Gin Gly Pro Val Gly Arg Thr 
740 745 750 

Gly Glu Val Gly Ala Val Gly Pro Pro Gly Phe Ala Gly Glu Lys Gly 
755 760 765 

Pro Ser Gly Glu Ala Gly Thr Ala Gly Pro Pro Gly Thr Pro Gly Pro 
770 775 780 

Gin Gly Leu Leu Gly Ala Pro Gly lie Leu Gly Leu Pro Gly Ser Arg 
785 790 795 800 

Gly Glu Arg Gly Leu Pro Gly Val Ala Gly Ala Val Gly Glu Pro Gly 
805 810 815 

Pro Leu Gly lie Ala Gly Pro Pro Gly Ala Arg Gly Pro Pro Gly Ala 
820 825 830 

Val Gly Ser Pro Gly Val Asn Gly Ala Pro Gly Glu Ala Gly Arg Asp 
835 840 845 

Gly Asn Pro Gly Asn Asp Gly Pro Pro Gly Arg Asp Gly Gin Pro Gly 
850 855 860 
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His Lys Gly Glu Arg Gly Tyr Pro Gly Asn lie Gly Pro Val Gly Ala 
5 865 870 875 880 

Ala Gly Ala Pro Gly Pro His Gly Pro Val Gly Pro Ala Gly Lys His 
10 885 890 895 

Gly Asn Arg Gly Glu Thr Gly Pro Ser Gly Pro Val Gly Pro Ala Gly 
900 905 910 



15 



20 



25 



30 



35 



40 



45 



50 



Ala Val Gly Pro Arg Gly Pro Ser Gly Pro Gin Gly lie Arg Gly Asp 
915 920 925 

Lys Gly Glu Pro Gly Glu Lys Gly Pro Arg Gly Leu Pro Gly Leu Lys 
930 935 940 

Gly His Asn Gly Leu Gin Gly Leu Pro Gly lie Ala Gly His His Gly 
945 950 955 960 

Asp Gin Gly Ala Pro Gly Ser Val Gly Pro Ala Gly Pro Arg Gly Pro 
965 970 975 

Ala Gly Pro Ser Gly Pro Ala Gly Lys Asp Gly Arg Thr Gly His Pro 
980 985 990 

Gly Thr Val Gly Pro Ala Gly He Arg Gly Pro Gin Gly His Gin Gly 
995 1000 1005 

Pro Ala Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Val 
1010 1015 1020 

Ser Gly Gly Gly Tyr Asp Phe Gly Tyr Asp Gly Asp Phe Tyr Arg Ala 
1025 1030 1035 1040 
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(2) INFORMATION FOR SEQ ID NO: 31: 

5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3120 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

15 

(ii) MOLECULE TYPE: cDNA 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

CAGTACGACG GTAAAGGCGT AGGCCTGGGT CCGGGTCCGA TGGGCCTGAT GGGTCCACGT 60 

25 

GGCCCACCGG GTGCAGCAGG TGCGCCGGGT CCGCAGGGCT TCCAAGGTCC GGCGGGTGAA 120 
CCGGGCGAAC CGGGTCAGAC GGGTCCGGCG GGTGCTCGCG GTCCGGCTGG CCCACCGGGC 180 

30 

AAAGCTGGCG AAGACGGTCA CCCGGGTAAG CCAGGCCGCC CGGGCGAACG TGGCGTCGTG 240 
35 GGTCCGCAAG GTGCGCGTGG TTTCCCGGGC ACGCCGGGTC TGCCGGGTTT CAAAGGCATT 300 

CGTGGTCACA ACGGTCTGGA CGGTCTGAAA GGCCAACCGG GTGCTCCGGG CGTCAAAGGC 360 

40 

GAACCGGGTG CCCCAGGCGA AAACGGTACG CCGGGCCAGA CTGGTGCGCG TGGTCTGCCG 420 
45 GGTGAACGCG GCCGTGTTGG CGCTCCGGGT CCGGCTGGCG CGCGTGGCAG CGATGGCTCC 480 

GTCGGTCCGG TTGGCCCTGC GGGTCCGATT GGTTCCGCTG GCCCTCCGGG TTTCCCGGGT 540 

50 

GCGCCGGGTC CGAAGGGTGA GATCGGCGCG GTTGGCAACG CAGGCCCGGC TGGTCCAGCC 600 
GGCCCTCGTG GCGAAGTCGG TCTGCCGGGT CTGAGCGGTC CGGTAGGCCC ACCGGGTAAC 660 

55 
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CCGGGCGCAA ACGGCCTGAC GGGTGCAAAA GGTGCGGCTG GCCTGCCGGG CGTTGCCGGT 720 

GCCCCGGGCC TGCCGGGTCC GCGCGGTATT CCGGGTCCGG TAGGCGCAGC CGGTGCAACT 780 

GGTGCCCGTG GCCTGGTTGG CGAACCGGGT CCGGCGGGTT CTAAAGGCGA AAGCGGTAAC 840 

AAAGGTGAGC CGGGTTCCGC GGGCCCGCAG GGTCCGCCGG GTCCGAGCGG CGAAGAAGGT 900 

AAACGTGGTC CGAACGGCGA GGCTGGTTCC GCAGGCCCTC CGGGTCCGCC GGGTCTGCGT 960 

GGCAGCCCGG GTAGCCGTGG CCTGCCGGGC GCGGACGGCC GTGCGGGCGT GATGGGTCCG 1020 

CCGGGTTCCC GTGGTGCCTC TGGTCCGGCT GGTGTCCGTG GTCCGAATGG CGACGCGGGC 1080 

25 CGTCCGGGTG AACCGGGCCT GATGGGTCCG CGTGGCCTGC CGGGTAGCCC GGGTAACATT 1140 

GGTCCGGCGG GTAAGGAGGG TCCGGTAGGT CTGCCGGGTA TTGATGGTCG TCCGGGTCCG 1200 

ATCGGCCCTG CGGGCGCTCG TGGCGAGCCG GGTAACATCG GTTTTCCGGG TCCGAAGGGT 1260 

CCGACGGGCG ACCCGGGCAA GAACGGTGAT AAAGGCCATG CAGGTCTGGC AGGTGCCCGT 1320 

GGTGCACCGG GTCCGGATGG TAACAATGGT GCGCAGGGTC CGCCGGGTCC GCAGGGCGTA 1380 

40 CAGGGTGGCA AAGGTGAACA GGGTCCGGCA GGCCCACCGG GCTTCCAGGG TCTGCCGGGT 1440 

CCGAGCGGCC CGGCTGGTGA AGTGGGCAAA CCGGGCGAAC GTGGCCTCCA TGGCGAGTTT 1500 

GGCCTGCCGG GTCCGGCCGG TCCGCGTGGT GAGCGCGGCC CTCCGGGCGA ATCCGGCGCG 1560 

GCAGGTCCGA CCGGCCCGAT TGGTTCCCGT GGTCCGAGCG GCCCACCGGG TCCGGACGGC 1620 

AACAAAGGCG AGCCGGGTGT TGTTGGTGCT GTTGGTACCG CCGGCCCGTC TGGTCCGAGC 1680 
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GGTCTGCCGG GCGAACGCGG 
CCGGGTCTGC GCGGTGAGAT 
GCGGTTGGCG CACCGGGTCC 
GGTCCGGCGG GTCCGGCCGG 
GCTGGCCCGA ATGGCTTTGC 
GAGCGCGGTG CCAAAGGCCC 
GGTGCGGCTG GTCCGGCTGG 
GGTGGCCCAC CGGGCATGAC 
CCGTCTGGCA TTTCTGGCCC 
GGCCCACGCG GCGACCAGGG 
CCGGGCTTTG CGGGTGAGAA 
ACGCCGGGTC CGCAAGGTCT 
GGCGAACGCG GTCTGCCGGG 
GCGGGTCCGC CGGGTGCGCG 
GCCCCTGGTG AAGCGGGCCG 
GGTCAGCCGG GTCACAAAGG 
GCCGGCGCTC CGGGTCCGCA 
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TGCCGCTGGT ATTCCGGGCG 
TGGCAACCCG GGCCGTGACG 
GGCAGGCGCG ACTGGTGATC 
CCCTCGCGGT TCCCCGGGCG 
TGGCCCAGCG GGCGCTGCGG 
GAAAGGTGAA AATGGTGTAG 
CCCGAATGGT CCGCCGGGTC 
CGGTTTCCCT GGCGCGGCCG 
ACCGGGTCCG CCGGGTCCGG 
TCCGGTGGGC CGTACCGGCG 
AGGTCCGAGC GGTGAAGCTG 
GCTGGGTGCT CCGGGTATCC 
CGTTGCAGGC GCTGTAGGCG 
TGGTCCGCCG GGTGCCGTGG 
CGACGGCAAT CCGGGCAACG 
TGAGCGTGGC TACCCGGGTA 
CGGTCCGGTA GGCCCAGCCG 



GCAAAGGTGA AAAAGGTGAA 
GTGCTCGCGG TGCACACGGC 
GTGGCGAAGC TGGTGCAGCG 
AACGCGGCGA AGTCGGCCCG 
GCCAACCGGG TGCGAAAGGT 
TTGGTCCGAC GGGTCCGGTT 
CGGCAGGCAG CCGTGGCGAT 
GTCGCACCGG CCCGCCGGGT 
CGGGCAAAGA AGGTCTGCGT 
AAGTCGGTGC TGTTGGCCCT 
GCACCGCAGG CCCGCCGGGT 
TGGGCCTGCC GGGCTCCCGT 
AACCGGGTCC GCTGGGTATC 
GCTCTCCGGG TGTTAACGGC 
ATGGTCCGCC GGGTCGTGAT 
ACATCGGTCC GGTTGGTGCG 
GCAAACACGG TAACCGTGGT 
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GAAACGGGTC CGTCCGGTCC GGTAGGTCCG GCGGGTGCTG TTGGTCCACG CGGCCCGTCC 2760 

5 

GGCCCGCAGG GTATTCGCGG TGACAAAGGC GAACCGGGCG AAAAAGGTCC GCGTGGTCTG 2820 
CCGGGCCTTA AGGGCCACAA CGGTCTGCAA GGTCTGCCGG GTATCGCGGG TCACCACGGT 2880 

10 

GATCAGGGTG CTCCGGGTTC CGTTGGTCCG GCCGGTCCGC GTGGCCCGGC TGGTCCGTCT 2940 
15 GGTCCGGCCG GTAAAGACGG CCGTACGGGC CACCCGGGTA CGGTGGGTCC GGCCGGCATT 3000 

CGCGGTCCGC AAGGTCACCA GGGTCCGGCG GGTCCGCCGG GTCCGCCGGG TCCGCCGGGT 3060 

20 

CCGCCGGGTG TTAGCGGTGG CGGTTATGAT TTTGGTTATG ACGGTGATTT CTATCGTGCG 3120 

25 

(2) INFORMATION FOR SEQ ID NO: 32: 
(i) SEQUENCE CHARACTERISTICS: 

30 

(A) LENGTH: 1040 amino acids 

(B) TYPE : amino acid 

(C) STRAND EDNESS : single 
35 (D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

40 



45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 

Gin Tyr Asp Gly Lys Gly Val Gly Leu Gly Pro Gly Pro Met Gly Leu 

50 

15 10 15 

55 
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Met Gly Pro Arg Gly Pro Pro Gly Ala Ala Gly Ala Pro Gly Pro Gin 
20 25 30 

Gly Phe Gin Gly Pro Ala Gly Glu Pro Gly Glu Pro Gly Gin Thr Gly 
35 40 45 

Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro Pro Gly Lys Ala Gly Glu 
50 55 60 

Asp Gly His Pro Gly Lys Pro Gly Arg Pro Gly Glu Arg Gly Val Val 
65 70 75 80 

Gly Pro Gin Gly Ala Arg Gly Phe Pro Gly Thr Pro Gly Leu Pro Gly 
85 90 95 

Phe Lys Gly lie Arg Gly His Asn Gly Leu Asp Gly Leu Lys Gly Gin 
100 105 110 

Pro Gly Ala Pro Gly Val Lys Gly Glu Pro Gly Ala Pro Gly Glu Asn 
115 120 125 

Gly Thr Pro Gly Gin Thr Gly Ala Arg Gly Leu Pro Gly Glu Arg Gly 
130 135 140 



40 Arg Val Gly Ala Pro Gly Pro Ala Gly Ala Arg Gly Ser Asp Gly Ser 

145 150 155 160 
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Val Gly Pro Val Gly Pro Ala Gly Pro lie Gly Ser Ala Gly Pro Pro 
165 170 175 

Gly Phe Pro Gly Ala Pro Gly Pro Lys Gly Glu He Gly Ala Val Gly 
180 185 190 
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Asn Ala Gly Pro Ala Gly Pro Ala Gly Pro Arg Gly Glu Val Gly Leu 
195 200 205 

Pro Gly Leu Ser Gly Pro Val Gly Pro Pro Gly Asn Pro Gly Ala Asn 
210 215 220 

Gly Leu Thr Gly Ala Lys Gly Ala Ala Gly Leu Pro Gly Val Ala Gly 
225 230 235 240 

Ala Pro Gly Leu Pro Gly Pro Arg Gly lie Pro Gly Pro Val Gly Ala 
245 250 255 

Ala Gly Ala Thr Gly Ala Arg Gly Leu Val Gly Glu Pro Gly Pro Ala 
260 265 270 

Gly Ser Lys Gly Glu Ser Gly Asn Lys Gly Glu Pro Gly Ser Ala Gly 
275 280 285 

Pro Gin Gly Pro Pro Gly Pro Ser Gly Glu Glu Gly Lys Arg Gly Pro 
290 295 300 

Asn Gly Glu Ala Gly Ser Ala Gly Pro Pro Gly Pro Pro Gly Leu Arg 
305 310 315 320 

Gly Ser Pro Gly Ser Arg Gly Leu Pro Gly Ala Asp Gly Arg Ala Gly 
325 330 335 

Val Met Gly Pro Pro Gly Ser Arg Gly Ala Ser Gly Pro Ala Gly Val 
340 345 350 

Arg Gly Pro Asn Gly Asp Ala Gly Arg Pro Gly Glu Pro Gly Leu Met 
355 360 365 
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Gly Pro Arg Gly Leu Pro Gly Ser Pro Gly Asn lie Gly Pro Ala Gly 
370 375 380 

Lys Glu Gly Pro Val Gly Leu Pro Gly lie Asp Gly Arg Pro Gly Pro 
385 390 395 40 0 

lie Gly Pro Ala Gly Ala Arg Gly Glu Pro Gly Asn lie Gly Phe Pro 
405 410 415 

Gly Pro Lys Gly Pro Thr Gly Asp Pro Gly Lys Asn Gly Asp Lys Gly 
420 425 430 

His Ala Gly Leu Ala Gly Ala Arg Gly Ala Pro Gly Pro Asp Gly Asn 
435 440 445 

Asn Gly Ala Gin Gly Pro Pro Gly Pro Gin Gly Val Gin Gly Gly Lys 
450 455 460 

Gly Glu Gin Gly Pro Ala Gly Pro Pro Gly Phe Gin Gly Leu Pro Gly 
465 470 475 480 

Pro Ser Gly Pro Ala Gly Glu Val Gly Lys Pro Gly Glu Arg Gly Leu 
485 490 495 

His Gly Glu Phe Gly Leu Pro Gly Pro Ala Gly Pro Arg Gly Glu Arg 
500 505 510 

Gly Pro Pro Gly Glu Ser Gly Ala Ala Gly Pro Thr Gly Pro lie Gly 
515 520 525 

Ser Arg Gly Pro Ser Gly Pro Pro Gly Pro Asp Gly Asn Lys Gly Glu 
530 535 540 
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Pro Gly Val Val Gly Ala Val Gly Thr Ala Gly Pro Ser Gly Pro Ser 
545 550 555 560 

Gly Leu Pro Gly Glu Arg Gly Ala Ala Gly lie Pro Gly Gly Lys Gly 
565 570 575 

Glu Lys Gly Glu Pro Gly Leu Arg Gly Glu lie Gly Asn Pro Gly Arg 
580 585 590 

Asp Gly Ala Arg Gly Ala His Gly Ala Val Gly Ala Pro Gly Pro Ala 
595 600 605 

Gly Ala Thr Gly Asp Arg Gly Glu Ala Gly Ala Ala Gly Pro Ala Gly 
610 615 620 

Pro Ala Gly Pro Arg Gly Ser Pro Gly Glu Arg Gly Glu Val Gly Pro 
625 630 635 640 

Ala Gly Pro Asn Gly Phe Ala Gly Pro Ala Gly Ala Ala Gly Gin Pro 
645 650 655 

Gly Ala Lys Gly Glu Arg Gly Ala Lys Gly Pro Lys Gly Glu Asn Gly 
660 665 670 

Val Val Gly Pro Thr Gly Pro Val Gly Ala Ala Gly Pro Ala Gly Pro 
675 680 685 

Asn Gly Pro Pro Gly Pro Ala Gly Ser Arg Gly Asp Gly Gly Pro Pro 
690 695 700 

Gly Met Thr Gly Phe Pro Gly Ala Ala Gly Arg Thr Gly Pro Pro Gly 
705 710 715 720 
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Pro Ser Gly lie Ser Gly Pro Pro Gly Pro Pro Gly Pro Ala Gly Lys 
725 730 735 



10 



Glu Gly Leu Arg Gly Pro Arg Gly Asp Gin Gly Pro Val Gly Arg Thr 
740 745 750 



15 



Gly Glu Val Gly Ala Val Gly Pro Pro Gly Phe Ala Gly Glu Lys Gly 
755 760 765 



20 



25 



30 



35 



Pro Ser Gly Glu Ala Gly Thr Ala Gly Pro Pro Gly Thr Pro Gly Pro 
770 775 780 

Gin Gly Leu Leu Gly Ala Pro Gly lie Leu Gly Leu Pro Gly Ser Arg 
785 790 795 800 

Gly Glu Arg Gly Leu Pro Gly Val Ala Gly Ala Val Gly Glu Pro Gly 
805 810 815 

Pro Leu Gly lie Ala Gly Pro Pro Gly Ala Arg Gly Pro Pro Gly Ala 
820 825 830 

Val Gly Ser Pro Gly Val Asn Gly Ala Pro Gly Glu Ala Gly Arg Asp 
835 840 845 



40 



Gly Asn Pro Gly Asn Asp Gly Pro Pro Gly Arg Asp Gly Gin Pro Gly 
850 855 860 



45 



His Lys Gly Glu Arg Gly Tyr Pro Gly Asn lie Gly Pro Val Gly Ala 
865 870 875 880 



50 



Ala Gly Ala Pro Gly Pro His Gly Pro Val Gly Pro Ala Gly Lys His 
885 890 895 
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10 



Gly Asn Arg Gly Glu Thr Gly Pro Ser Gly Pro Val Gly Pro Ala Gly 
900 905 910 

Ala Val Gly Pro Arg Gly Pro Ser Gly Pro Gin Gly lie Arg Gly Asp 
915 920 925 

Lys Gly Glu Pro Gly Glu Lys Gly Pro Arg Gly Leu Pro Gly Leu Lys 
930 935 940 

Gly His Asn Gly Leu Gin Gly Leu Pro Gly He Ala Gly His His Gly 
945 950 955 960 

Asp Gin Gly Ala Pro Gly Ser Val Gly Pro Ala Gly Pro Arg Gly Pro 
965 970 975 

Ala Gly Pro Ser Gly Pro Ala Gly Lys Asp Gly Arg Thr Gly His Pro 
980 985 990 

Gly Thr Val Gly Pro Ala Gly He Arg Gly Pro Gin Gly His Gin Gly 
995 1000 1005 

Pro Ala Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Val 
1010 1015 1020 



40 Ser Gly Gly Gly Tyr Asp Phe Gly Tyr Asp Gly Asp Phe Tyr Arg Ala 

1025 1030 1035 1040 



20 



25 



30 



35 



45 

(2) INFORMATION FOR SEQ ID NO: 33 



50 



55 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 76 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
GGAATTCATG CAGTATGATG GCAAAGGCGT CGGCCTCGGC CCGGGCCCAA TGGGCCTCAT 
GGGCCCGCGC GGCCCA 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 79 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
CCGGGCGCGC CGGGTGGCCC ACGTCGACCG CGGGGTCCGG GCGTTCCAAA GGTCCCGGGA 
CGGCCAATTA TTCGAACCC 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: B2 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
GGAATTCGCC GGTGAGCCGG GTGAACCGGG CCAAACGGGT CCGGCAGGTC CACGTGGTCC 60 
AGCGGGCCCG CCTGGCAAGG CG 82 
(2) INFORMATION FOR SEQ ID NO: 36: 



(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 84 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
CCGGGCGGAC CGTTCCGCCC ACTTCTACCG GTGGGACCGT TTGGCCCGGC GGGCCACTCG 60 
45 CACCGCATCA CATTATTCGA ACCC 84 

(2) INFORMATION FOR SEQ ID NO: 37: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 240 base pairs 
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(B) TYPE : nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
CAGTATGATG GCAAAGGCGT CGGCCTCGGC CCGGGCCCAA TGGGCCTCAT GGGCCCGCGC 
GGCCCACCGG GTGCAGCTGG CGCCCCAGGC CCGCAAGGTT TCCAGGGCCC TGCCGGTGAG 
CCGGGTGAAC CGGGCCAAAC GGGTCCGGCA GGTGCACGTG GTCCAGCGGG CCCGCCTGGC 
AAGGCGGGTG AAGATGGCCA CCCTGGCAAA CCGGGCCGCC CGGGTGAGCG TGGCGTAGTG 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

Gin Tyr Asp Gly Lys Gly Val Gly Leu Gly Pro Gly Pro Met Gly Leu 
15 10 15 
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Met Gly Pro Arg Gly Pro Pro Gly Ala Ala Gly Ala Pro Gly Pro Gin 
20 25 30 

Gly Phe Gin Gly Pro Ala Gly Glu Pro Gly Glu Pro Gly Gin Thr Gly 
35 40 45 



Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro Pro Gly Lys Ala Gly Glu 

15 50 55 60 

Asp Gly His Pro Gly Lys Pro Gly Arg Pro Gly Glu Arg Gly Val Val 
20 65 70 75 80 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 276 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
ATGGGGCTCG CTGGCCCACC GGGCGAACCG GGTCCGCCAG GCCCGAAAGG TCCGCGTGGC 60 

45 

GATAGCGGGC TCGCTGGCCC ACCGGGCGAA CCGGGTCCGC CAGGCCCGAA AGGTCCGCGT 120 
GGCGATAGCG GGCTCGCTGG CCCACCGGGC GAACCGGGTC CGCCAGGCCC GAAAGGTCCG 180 

50 

CGTGGCGATA GCGGGCTCGC TGGCCCACCG GGCGAACCGG GTCCGCCAGG CCCGAAAGGT 240 
55 CCGCGTGGCG ATAGCGGGCT CCCGGGCGAT TCCTAA 276 
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INFORMATION FOR SEQ ID NO: 40; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 

Met Gly Leu Ala Gly Pro Pro Gly Glu Pro Gly Pro Pro Gly Pro Lys 
15 10 15 

Gly Pro Arg Gly Asp Ser Gly Leu Ala Gly Pro Pro Gly Glu Pro Gly 
20 25 30 

Pro Pro Gly Pro Lys Gly Pro Arg Gly Asp Ser Gly Leu Ala Gly Pro 
35 40 45 

Pro Gly Glu Pro Gly Pro Pro Gly Pro Lys Gly Pro Arg Gly Asp Ser 
50 55 60 

Gly Leu Ala Gly Pro Pro Gly Glu Pro Gly Pro Pro Gly Pro Lys Gly 
65 70 75 80 

Pro Arg Gly Asp Ser Gly Leu Pro Gly Asp Ser 
85 90 
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(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 

Gly Pro Pro Gly Leu Ala Gly Pro Pro Gly Glu Ser Gly 
15 10 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 2.. 3 

(D) OTHER INFORMATION: /product- "4 -hydroxyproline" 



55 
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(ix) FEATURE: 

(A) NAME/KEY: Modified- site 

(B) LOCATION: 8. .9 

(D) OTHER INFORMATION: /product= "Xaa e 4-hydroxyproline» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 

Gly Xaa Xaa Gly Leu Ala Gly Xaa Xaa Gly Glu Ser Gly 
15 10 

(2) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 660 base pairs 

(B) TYPE: nucleic acid 

( C ) STRAND EDNES S : s ingl e 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

ATGGGCCCGC CGGGTCTGGC GGGCCCTCCG GGTGAAAGCG GTCGTGAAGG CGCGCCGGGT 60 

GCCGAAGGCA GCCCAGGCCG CGACGGTAGC CCGGGGGCCA AAGGGGATCG TGGTGAAACC 120 

GGCCCGGCGG GCCCCCCGGG TGCACCGGGC GCGCCGGGTG CCCCAGGCCC GGTGGGCCCG 180 

GCGGGCAAAA GCGGTGATCG TGGTGAGACC GGTCCGGCGG GCCCGGCCGG TCCGGTGGGC 240 

CCAGCGGGCG CCCGTGGCCC GGCCGGTCCG CAGGGCCCGC GGGGTGACAA AGGTGAAACG 300 

GGCGAACAGG GCGACCGTGG CATTAAAGGC CACCGTGGCT TCAGCGGCCT GCAGGGTCCA 360 
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CCGGGCCCGC CGGGCAGTCC GGGTGAACAG GGTCCGTCCG GAGCCAGCGG GCCGGCGGGC . 420 

CCACGCGGTC CGCCGGGCAG CGCGGGCGCG CCGGGCAAAG ACGGTCTGAA CGGTCTGCCG 480 

GGCCCGATCG GCCCGCCGGG CCCACGCGGC CGCACCGGTG ATGCGGGTCC GGTGGGTCCC 540 

CCGGGCCCGC CGGGCCCGCC AGGCCCGCCG GGACCGCCGA GCGCGGGTTT CGACTTCAGC 600 

TTCCTGCCGC AGCCGCCGCA GGAGAAAGCG CACGACGGCG GTCGCTACTA CCGTGCGTAA 660 



20 (2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 219 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 

30. 

(ii) MOLECULE TYPE: peptide 

35 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 

Met Gly Pro Pro Gly Leu Ala Gly Pro Pro Gly Glu Ser Gly Arg Glu 
40. 15 10 15 

Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly Arg Asp Gly Ser Pro Gly 
45 20 25 30 

Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly Pro Pro Gly Ala 
35 40 45 

so 

Pro Gly Ala Pro Gly Ala Pro Gly Pro Val Gly Pro Ala Gly Lys Ser 
50 55 60 
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Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly Pro Ala Gly Pro Val Gly 
65 70 75 80 

Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro Gin Gly Pro Arg Gly Asp 
85 90 95 

Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg Gly He Lys Gly His Arg 
100 105 110 

Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly Pro Pro Gly Ser Pro Gly 
115 120 125 

Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro Ala Gly Pro Arg Gly Pro 
130 135 140 

Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp Gly Leu Asn Gly Leu Pro 
145 150 155 160 

Gly Pro He Gly Pro Pro Gly Pro Arg Gly Arg Thr Gly Asp Ala Gly 
165 170 175 

Pro Val Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro 
180 185 190 

Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu Pro Gin Pro Pro Gin Glu 
195 200 205 

Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg Ala 
210 215 

INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 627 base pairs 
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(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

ATGGGCTCTC CGGGTGTTAA CGGCGCCCCT GGTGAAGCGG GCCGCGACGG CAATCCGGGC 60 

AACGATGGTC CGCCGGGTCG TGATGGTCAG CCGGGTCACA AAGGTGAGCG TGGCTACCCG 120 

GGTAACATCG GTCCGGTTGG TGCGGCCGGC GCTCCGGGTC CGCACGGTCC GGTAGGCCCA 180 

GCCGGCAAAC ACGGTAACCG TGGTGAAACG GGTCCGTCCG GTCCGGTAGG TCCGGCGGGT 240 

GCTGTTGGTC CACGCGGCCC GTCCGGCCCG CAGGGTATTC GCGGTGACAA AGGCGAACCG 300 

GGCGAAAAAG GTCCGCGTGG TCTGCCGGGC CTTAAGGGCC ACAACGGTCT GCAAGGTCTG 360 

CCGGGTATCG CGGGTCACCA CGGTGATCAG GGTGCTCCGG GTTCCGTTGG TCCGGCCGGT 420 

CCGCGTGGCC CGGCTGGTCC GTCTGGTCCG GCCGGTAAAG ACGGCCGTAC GGGCCACCCG 480 

GGTACGGTGG GTCCGGCCGG CATTCGCGGT CCGCAAGGTC ACCAGGGTCC GGCGGGTCCG 540 

CCGGGTCCGC CGGGTCCGCC GGGTCCGCCG GGTGTTAGCG GTGGCGGTTA TGATTTTGGT 600 

TATGACGGTG ATTTCTATCG TGCGTAA 627 
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INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 219 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNE5S : single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 

Met Gly Pro Pro Gly Leu Ala Gly Pro Pro Gly Glu Ser Gly Arg Glu 
15 10 15 

Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly Arg Asp Gly Ser Pro Gly 
20 25 30 

Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly Pro Pro Gly Ala 
35 40 45 

Pro Gly Ala Pro Gly Ala Pro Gly Pro Val Gly Pro Ala Gly Lys Ser 
50 55 60 

Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly Pro Ala Gly Pro Val Gly 
65 70 75 80 

Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro Gin Gly Pro Arg Gly Asp 
85 90 95 

Lys Gly Glu Thr Gly Glu Gin Gly Asp Arg Gly lie Lys Gly His Arg 
100 105 110 
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Gly Phe Ser Gly Leu Gin Gly Pro Pro Gly Pro Pro Gly Ser Pro Gly 
5 115 120 125 

Glu Gin Gly Pro Ser Gly Ala Ser Gly Pro Ala Gly Pro Arg Gly Pro 
10 130 135 140 

Pro Gly Ser Ala Gly Ala Pro Gly Lys Asp Gly Leu Asn Gly Leu Pro 
15 145 150 155 160 

Gly Pro lie Gly Pro Pro Gly Pro Arg Gly Arg Thr Gly Asp Ala Gly 
165 170 175 

20 

Pro Val Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro 
180 185 190 
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Pro Ser Ala Gly Phe Asp Phe Ser Phe Leu Pro Gin Pro Pro Gin Glu 
195 200 205 

Lys Ala His Asp Gly Gly Arg Tyr Tyr Arg Ala 
210 215 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 95 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 



55 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
GGAATTCTCC CATGGGCCCO CCGGGTCTGG CGGGCCCTCC GGGTGAAAGC GGTCGTGAAG 60 
GCGCGCCGGG TGCCGAAGGC AGCCCAGGCC GCGAC 95 
(2) INFORMATION FOR SEQ ID NO: 48: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 97 hase pairs 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS : single 

<D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 



CTTCCGTCGG GTCCGGCGCT GCCATCGGGC CCCCGGTTTC CCCTAGCACC ACTTTGGCCG 60 



GGCCGCCCGG GGGGCCCACG TGGCATTATT CGAACCC 97 



(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
45 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

50 



55 



147 



EP 0 992 586 A2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
GGAATTCGGT GCACCGGGCG CGCCGGGTGC CCCAGGCCCG GTGGGCCCGG CGGGCAAAAG 60 
CGGTGATCGT GGCGAGACCG GTCCGGCGGG C 91 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
CTCTGGCCAG GCCGCCCGGG CCGGCCAGGC CACCCGGGTC GCCCGCGGGC ACCGGGCCGG 60 
CCAGGCGTCC CGGGCGCCAT TATTCGAACC C 91 



Claims 



1. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof capable of providing a self ag- 
gregate in a cell which does not ordinarily hydroxylate proline comprising 

providing a nucleic acid sequence encoding the EMP or fragment thereof which has been optimized for ex- 
pression in the cell by substitution of codons preferred by the cell for naturally occurring codons not preferred 
by the cell; 

incorporating the nucleic acid sequence into the cell; 

providing hypertonic growth media containing at least one amino acid selected from the group consisting of 
fran s-4-hydroxyprol in e and 3-h yd roxy proline; and 

contacting the cell with the growth media wherein the at least one amino acid is assimilated into the cell and 
incorporated into the EMP or fragment thereof. 

2. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 1 wherein the 
EMP is selected from the group consisting of human collagen, fibrinogen, fibronectin and collagen-like peptide. 

3. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 1 or 2, wherein 
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the cell is a prokaryote. 

4. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 3 ( wherein 
the prokaryote is E. coti. 

5. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to any of claims 2 - 4, 
wherein the human collagen is Type I (oc1). 

6. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 5, wherein 
the nucleic acid encoding human collagen Type I (a1) includes the sequence shown in SEQ.ID.NO.19. 

7. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to any of claim 2 to 4, 
wherein the human collagen is Type I (a2). 

8. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 7, wherein 
the nucleic acid encoding human collagen Type I (a2)= includes the sequence shown in SEQ.ID.NO.31. 

9. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to any of claims 1 to 
8, wherein the nucleic acid encoding the EMP includes the sequence shown in SEQ.ID.NO. 43. 

10. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to any of claims 1 to 
8, wherein the nucleic acid encoding the EMP includes the sequence shown in SEQ.ID.NO. 46. 

11. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to any of claims 1 to 
10, wherein the nucleic acid sequence includes nucleic acid encoding a physiologically active peptide. 

12. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 11 , wherein 
the physiologically active peptide is selected from the group consisting of bone morphogenic protein, transforming 
growth factor-p and decorin. 

13. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to any of claims 1 to 
4 t wherein the EMP or fragment thereof is a collagen-like peptide. 

14. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 13, wherein 
the EMP or fragment thereof includes the amino acid sequence depicted in SEQ.ID.NO. 4. 

15. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 13, wherein 
the EMP includes the amino acid sequence depicted in SEQ.ID.NO.40. 

16. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 1, wherein 
the EMP includes the amino acid sequence depicted in SEQ.ID.NO. 44. 

17. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 1, wherein 
the EMP is a collagen fragment including the amino acid sequence depicted in SEQ.ID.NO. 26. 

18. A method of producing an Extracellular Matrix Protein (EMP) or fragment thereof according to claim 1, wherein 
the EMP is a collagen fragment including the amino acid sequence depicted in SEQ.ID.NO. 46. 

19. Nucleic acid encoding a chimeric protein comprising a domain from a physiologically active peptide and a domain 
from an Extracellular Matrix Protein (EMP) which is capable of providing a self-aggregate. 

20. Nucleic acid encoding a chimeric protein according to claim 19, wherein said EMP is selected from the group 
consisting of human collagen, fibrinogen, fibronectin and collagen-like peptide. 

21. Nucleic acid encoding a chimeric protein according to claim 19 or 20 wherein said domain from a physiologically 
active peptide is selected from the group consisting of bone morphogenic protein, transforming growth factor - p 
and decorin. 
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22. Nucleic acid encoding a chimeric protein according to any of claims 19 - 21 , wherein said chimeric protein includes 
the sequence shown in SEQ.ID.NO.6. 

23. Nucleic acid encoding a chimeric protein according to any of claims 19-21, wherein said chimeric protein includes 
the sequence shown in SEQ.ID.NO.8. 

24. Nucleic acid encoding a chimeric protein according to any of claims 19 - 21, wherein said chimeric protein includes 
the sequence shown in SEQ.ID.NO.11. 

25. Nucleic acid encoding a chimeric protein according to any of claims 19 - 21, wherein said chimeric protein includes 
the sequence shown in SEQ.ID.NO.10. 

26. A cloning vector comprising nucleic acid according to any of claims 19-21. 

27. A cloning vector according to claim 26 wherein said cloning vector is selected from the group consisting of plasmid, 
phage, cosmid and artificial chromosome. 

28. A cell transformed by a vector according to claim 26 or 27. 

29. A chimeric protein comprising a domain from a physiologically active peptide and a domain from an Extracellular 
Matrix Protein (EMP) which is capable of providing a self-aggregate. 

30. A chimeric protein according to claim 29 wherein said EMP is selected from the group consisting of human collagen, 
fibrinogen, fibronectin and collagen-like peptide. 

31 . A chimeric protein according to claim 29 or 30 wherein said domain from a physiologically active peptide is selected 
from the group consisting of bone morphogenic protein, transforming growth factor - p and decorin. 

32. A chimeric protein according to any of claims 29 - 31 , wherein said chimeric protein includes the sequence shown 
in SEQ.ID.NO.6. 

33. A chimeric protein according to any of claims 29 - 31 , wherein said chimeric protein includes the sequence shown 
in SEQ.ID.NO.8. 

34. A chimeric protein according to any of claims 29 - 31 , wherein said chimeric protein includes the sequence shown 
inSEQ.ID.NO.10. 

35. A chimeric protein according to any of claims 29 - 31 , wherein said chimeric protein includes the sequence shown 
in SEQ.ID.NO.11. 

36. Human collagen or fragment thereof produced by a prokaryotic cell, the human collagen or fragment thereof being 
capable of providing a self-aggregate. 

37. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 36 wherein the human 
collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ.ID.NO.19. 

38. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 36 wherein the human 
collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ.ID.NO.39. 

39. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 36 wherein the human 
collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ.ID.NO.43. 

40. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 36 wherein the human 
collagen or fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ.ID.NO.45. 

41. Human collagen or fragment thereof produced by a prokaryotic cell according to claim 36 wherein the collagen or 
fragment thereof is encoded for by nucleic acid having the sequence shown in SEQ.ID.NO.31. 
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42. Nucleic acid comprising the sequence shown in SEQ.ID.NO. 19. 

43. Nucleic acid comprising the sequence shown in SEQ.ID.NO. 31. 

44. Nucleic acid comprising the sequence shown in SEQ.ID.NO. 43. 

45. Nucleic acid comprising the sequence shown in SEQ.ID.NO. 45. 

46. Nucleic acid encoding a human Extracellular Matrix Protein (EMP) or fragment thereof wherein the codon usage 
in the nucleic acid sequence reflects preferred codon usage in a prokaryotic cell. 

47. Nucleic acid according to claim 46 wherein the prokaryotic cell is E. coli. 

48. Nucleic acid according to claim 43 wherein the EMP is selected from the group consisting of collagen, fibrinogen, 
fibronectin and collagen-like peptide. 
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5'- CAGCTGTCTT ATGGCTATGA TGAGAAATCA ACCGGAGGAA TTTCCGTGCC 
TGGCCCCATG GGTCCCTCTG GTOCTCGTGG TCTOOCTGGC CCCCCTGGTG 
CACCTGGTCC CCAAGGCTTC CAAGGTCCCC CTGGTGAGCC TGGCGAGCCT 
GGAGCTTCAG GTCCCATGGG TCOCCGAGGT CCCCCAGGTC CCCCTGGAAA 
GAATGGAGAT GATGGGGAAG CTGGAAAACC TGGTCGTCCT GGTGAGCGTG 
GGCCTCCTGG GCCTCAGGGT GCTCGAGGAT TGCCCGGAAC AGCTGGCCTC 
CCTGGAATGA AGGGACACAG AGGTTTCAGT GGTTTGGATG GTGCCAAGGG 
AGATGCTGGT CCTGCTGGTC CTAAGGGTGA GCCTGGC^ CCTGGTGAAA 
ATGGAGCTCC TGGTCAGATG GGCCCCCGTG GCCTGCCTGG TGAGAGAGGT 
CGCCCTGGAG CCCCIGGCCC TGCTGGTGCT CGTGGAAATG ATGGTGCTAC 
TGGTGCTGCC GGGCCCCCTG GTCCCACCGG CCCOGCTGGT CCTCCTGGCT 
TCCCTGGTGC TGTTGGTGCT AAGGGTGAAG CTGGTCCCCA AGGGCCCCGA 
GGCTCTGAAG GTCCCCAGGG TGTGCGTGGT GAGOCTGGCC CCCCIGGCCC 
TGCTGGTGCT GCTGGCCCTG CTGGAAACCC TGGTGCTGAT GGACAGCCTG 
GTGCTAAAGG TGCCAATGGT GCTCCTGC5TA TTGCTGGTGC TCCTGGCTTC 
CCTGGTGCCC GAGGCCCCTC TGGACCCCAG GGCCCCGGCG GCCCICCTGG 
TCCCAAGGGT AACAGCGGTG AAOCTGGTGC TCCIGC^AGC AAAGGAGACA 
CTGGTGCTAA GGGAGAGCCT GGCCCTGTTG GTGTTCAAGG ACCCCCTGGC 
CCTGCTGGAG AGGAAGGAAA GCGAGG.AGCT CGAGGTGAAC CCGGACCCa_C 
TGGCCTGCCC GGACCCCCTG GCGAGCGTGG TGGACCTGGT AGCCGTGGTT 
TCCCTGGCGC AC^ATGGTGTT GCTGGTCCCA AGGGTCCCGC TGGTGAACGT 
GGTTCTCCTG GOCCCGCTCG CCOCAAAGGA TCTCCTGGTG AAGCTGGTCG 
TCCCGGTGAA GCTGGTCTGC CTGGTGCGA GGGTCTGACT GGAAGCCCTG 
GCAGCCCTGG TCCTGATGGC AAAACTGGCC CCCCTGGTCC CGCCGGTCAA 
GATGGTCGCC CCGGACCCCC AGGCCCftCCT GGTGCCCGIG GTCAGGCTGG 
TGTGATGGGA TTCCCTGGAC CTAAAGGTGC TGCTGGAGAG CCCGGCAAGG 
CTGGAGAGCG AGGTGTTCCC GGACCCCCTG GCGCTGTCGG TCCTGCTGGC 
AAAGATGGAG AGGCTGGAGC TCAGGGACCC CCTOGCCCTG CTGGICCCGC 
TGGCGAGAGA GGTGAACAAG GCCCTGCTGG CT&CCCGGA TTOSGGGTC 
TCCCTGGTCC TGCTGGTCCT CCAGGTGAAG CAGGCAA^CC TGGTGAACAG 
GGTGTTCCTG GAGACCTTGG CGCCCCTGGC CCCTCIGGAG C^AJGAGGCQV 
GAGAGGTTTC CCTGGCGAGC GTGGTGTGCA AGGTCCCCCT GGTCCTGCTG 
GACCCCGAGG GGCCAACGGT GCTCCCGGCA ACGATGGTGC TAAGGGTGAT 
GCTGGTGCCC CTGGAGCTCC CGGTAGCCAG GGCGCCCCTG GCCTTCAGGG 
MTGCCTGGT GAACGTGGTG CAGCTGGTCT TCCAGGGCCT AAGGGTGAC\ 
GAGGTGATGC TGGTCCCAAA GGTGCTGATG GCTCTCCTGG CAAAGATGGC 
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GTCCGTGGTC TGACCQ3XC CATTGGTCCT CCTGGCCCTG C/rGGTGCCCC. 
TGGTGACAAG GGTGAAAGTG GTCCCAGCGG CCCTGCTGGT CCCACTGGAG 
CTCGTGGTGC CCCCGGAGAC CGTGGTGAGC CTGGTCCCCC CGGCCCTGCT 
GGCTTTGCTG GCCCCCCTGG TGCTGACGGC CAACCTGGTG CTAAAGGCGA 
ACCTGGTGAT GCTGGTGCCA AAGGCGATGC TGGTCCCCCT GGGCCTGCCG 
GACCCGCTGG AOCCCCTGGC CCCATTGGTA ATGTTGGTGC TOCTGGAGCC 
AAAGGTGCTC GOGGCAGCGC TGGTCCCCCT GGTGCTACTG GTTTCCCTGG 
TGCTGCTGGC CGAGTCGGTC CTCCTGGCCC CTCTGGAAAT GCTGGACCCC 
CTGGCCCTCC TGGTCCTGCT GGCAAAGAAG GCGGCAAAGG TCCCCGTGGT 
GAGACTGGCC CTGCTGGACG TCCTGGTGAA GTTGGTCCCC CTGGTCCCCC 
TGGCCCTGCT GGCGAGAAAG GATCCCCTGG TGCTGATGGT CCTGCTGGTG 
CTCCTGGTAC TCCCGGGCCT CAAGGTATTG CTGGACAGCG TGGTGTGGTC 
GGCCTGCCTG GTCAGAGAGG AGAGAGAGGC TTCCCTGGTC TTCCTGGCCC 
CTCTGGTGAA CCTGGCAAAC AAGGTCCCTC TGGAGCAAGT GGTGAACGTG 
GTCCCCCCGG TCCCATGGGC CCCCCTGGAT TGGCTGGACC CCCTGGTGAA 
TCTGGACGTG AGGGGGCTCC TGCTGCCGAA GGTTCCCCTG GACGAGACGG 
TTCTCCTGGC GOCAAGGGTG ACCGXGGTGA GACCGGCCCC GCTGG.aCCCC 
CTGGTC-CTCC TGGTGCICCT GGTGCCCCTG GCCCCGTTGG CCCTGCTGGC 
AAGAGTGGTG ATCGTGGTGA GACTGGTCCT C-CTGGTCCCG CCGGTCCCGT 
CGGCCCCGCT GGCGCCCGTG GCCCCGCCGG ACCO^AGGC CCCCGTGGTG 
ACaAGGGTGA GACAGGCGAA CAGGGCGACA GAGGCATAAA GGGTC a JCCGT 
GGCTTCTCTG GCCTCCAGGG TCOCCCTGGC CCTCCTGGCT CTCCTGGTGA 
ACAAGGTCCC TCTGGAGCCT CTGGTCCTGC TGGTCCCCGA GGTCCCCCTC 
GCTCTGCTGG TGCTCCTGGC AAAGATGGAC TCAACGGTCT CCCTGGCCCC 
ATTGGGCCCC CTCGTCCTCG CGGTCGCACT GGTGATGCTG GTCCTGTTGG 
TCCCCCCGGC CCTCCTGGAC CTCCTGGICC CCCTGGTCCT CCCAGCGCTG 
GTTTCGACTT CAGCTTCCTC CCCCAGCCAC CTCAAGAGAA GGCTCACGAT 
GGTGGCCGCT ACTACCGGGC't-3 1 



FIG. 3B 
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FIG. 4 
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5'- CAGCTGTCTT ATGGCTATGA TGAGAMTCA ACCGGAGGAA TTTCCGTGCC 
TGGCCCCATG GGTCCCTCTG GTCCTCGTGG TCTCCCTGGC CCCCCTGGTG 
CACCTGGTCC CCAAGGCTTC CAAGGTCCCC CTGGTGAGCC TGGCGAGCCT 
GGAGCTTCAG GTCCCATGGG TCOCCGAGGT CCCOCAGGTC CCCCTGGAAA 
GAATGGAGAT GATGGGGAAG CTGGAAftACC TGGTCGTCCT-3 1 



FIG. 5 
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FIG. 6 
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GGA TCC ATG nnn CTC GCT GGC CCA- CCG GGC GAA CCG GGT 
rrr, pca GC,r. rr.n aaa GOT CCG CGT GGC GAT AGC GGG CTC 
CCG GGC GAT TCC TAA TGG ATC C 



FIG. 7 
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Gly-Leu-'Ala-Gly-Pro-Pro-Gly-Glu-Pro-Gly-Pro-Pro- 
Gly-Pro-Lys-Gly-Pro-Arg-Gly-Asp-Ser 



FIG. 8 
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FIG. 9 
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5'- CAGCGGGCCA GGAAGAAGAA TAAGAACTGC CGGCGCCACT CGCTdATGT 

GGACTTCAGC GATGTGGGCT GGAATGACTG GATTGTGGCC CCACCAGGCT 

ACCAGGCCTT CTACTGCCAT GGGGACTGCC QCTTTCCACT GGCTGACCAC 

CTCAACTCAA CCAACCATGC CATTGTGCAG ACCCTGGTCA ATTCTGTCAA 

TTCCAGTATC CCCAAAGCCT GTTGTGTGCC CACTGAACTG AGTGCCATCT 

CCATGCTGTA CCTGGATGAG TATGATAAGG TGGTACTGAA AAATTATCAG 

GAGATGGTAG TAGAGGGATG TGGGTGCCGC -3 • 



FIG. 10 
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FIG. II 
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Mole percent of MBP 
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10 20 30 40 , 50 60 

QLSYGYDEXS TGGISVPGPM GPSGPRGLPG PPGAPGPQGF QCPPGEPGEP GASGPMGPRG 

70 80 90 100 110 120 

PPGPPCKNGD DCEAGXPGRP GERGPPGPQG ARGLPGTACL PGMXGHRGFS GLDGAXCDAC 

130 140 150 160 170 180 

PAGPXGEPGS PCENGAPGQM GPRGLPCERG RPGAPCPAGA RCNDGATGAA CPPGPTGPAG 

190 200 210 220 230 240 

PPGFPGAVGA XGEAGPQCFR GSEGPQGVRC EPGPPGPAGA AGPAGNPGAD GQ PGAXGANC 

250 260 270 280 290 300 

APGIAGAPGF PGARGPSGPQ GPGCPPCPXG HSGEPGAPGS XCOTGAXGEP GFVGVQGPPG 

310 320 330 340 350 360 

PAGEECJCRGA RCEPGPTGLP GPPGERGGPG SRGFPGADGV AG PKG PAGER GSPGPAGPXG 

370 380 390 400 410 420 

SPGEAGRPCE AGLPGAXGLT GSPGSPGPDG KTG PPG PAGO DGRPGFPGFP GARGQAGVWG 

430 440 450 460 470 480 

FPGPXGAAGE PGKAGERGVP GPPGAVGPAG XDG EAGAQCP PGPACPAGER GEQGPAGSPG 

490 500 510 520 530 540 

FQGLPGPAC? PGEAGXPGEQ GVPGDLGAPG PSGARGERGF PGERGVQCPP GPXG?e^C^G 

550 560 570. 580 590 600 

APGNDGAKGD AGAPGAPGSQ GAPGLQGMPG ERGAAGLPGP KGDRGDAGPX GADGSPGKDG 

610 620 630 640 650 660 

VRGLTGPIGP P3PAGAPGDX GESGPSGPAG PTGARGAPCD RGEPG PPG PA GFAGPPGADG 

670 680 690 700 710 720 

QPGAXGEPGD AGAXGDAGPP GPAGPAGPPG PIGNVGAPGA XGARGSAGPP GATGFPGAAG 

730 740 750 760 770 780 

RVGPPGPSGN AGPPGPPGPA GKEGGKQfRG ETGPAGRPGE VGPPGPPGPA GEKGSPGADG 

790 800 810 820 830 840 . 

PAGAPGTPGP QGIAGQRGW GLFGQRGERG FPGLPGPSGE PGXQGPSGAS CERGPPGFHG 

850 860 870 880 890 900 

PFGLAGPPGS SGREGAPAAE G5PGRDGSPG AXGDRGETGP AGPPGAXGAX GAPGPVGPAG 

910 920 930 940 950 960 

XSGDRGETGP AGPAGFVGPA GARGPACPQG PRGDXGETGE QGDRGIKGKR GFSGLQGPFG 

970 .980 990 1000 1010 1020 

PPGSPCEQGP SGASGPAGPR GPPGSAGAPG KDGLNGLPG? IGPPGPRGRT GDAGFVCPPG 

1030 1040 1050 1060 1070 1080 

PPGPPGPPGP PSACFDFSFL PQPPQEXAHD GGRYYRARSQ RARXXNXNCR RHSLYVDFSD 

1090 1100 1110 1120 1130 1140 

VCWNDa'IVAP PGYQAFYCHG DCPFPLADHL NSTNKAXVQT LVNSVNSSIP XACCVFTELS 

1150 1160 1170 l l 80 1200 

AISMOYLDEY DKWL.XNYQE MWEGCGCR* 



FIG. 13 
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10 20 30 40 • 50 60 

gggaaggatt tccatttccC AGCTCTCTTA TGGCTATGAT GAGAAATCAA CCCGAGGAAT 

70 80 90 100 - 110 120 

TTCCCTGCCT CGCCCCATGG GTCCCTCTGG TCCTCGTGCT CTCCCTGGCC CCCCTGGTGC 

130 140 150 160 170 180 

ACCTGGTCCC CAAGGCTTCC AAGGTCCCCC TGQTGAGCCT GGCGAGCCTG GAGCTTCAGG 

190 200 210 220 230 240 

TCCCATGGGT CCCCGAGGTC CCCCAGGTCC CCCTC'iAAAG AATGGAGATG ATGGGGAAGC 

250 260 270 2B0 290 300 

TGGAAAACCT GGTCGTCCTG GTGAGCGTGG CCCTOrTGGG CCTCAGGGTG CTCGAGGATT 

310 320 330 340 350 360 

GCCCGGAACA GCTGGCCTCC CTCGAATGAA CGGACACAGA GGTTTCAGTG GTTTGGATGG 

370 380 390 400 410 420 

TGCCAAGGGA GATGCTGGTC CTGCTGGTCC TAAGG'STGAG CCTGGCAGCC CTGGTGAAAA 

430 .440 450 460 470 480 

TGGAGCTCCT GGTCAGATGG GCCCCCGTGG CCTGCCTGGT GAGAGAGGTC GCCCTGGAGC 

490 500 510 520 530 540 

CCCTGGCCCT GCTGGTGCTC GTGGAAATGA TGGTO:TACT GGTGCTGCCG GGCCCCCTGG 

550 560 570 580 590 600 

TCCCACCGGC CCCGCTGGTC CTCCTGGCTT CCCTGOTGCT GTTGGTGCTA AGGGTGAAGC 

610 620 . . 630 640 650 660 

TGGTCCCCAA GGGCCCCGAG GCTCTGAAGG TCCCCAGGGT GTGCCTGGTG AGCCTGGCCC 

670 680 690 700 710 720 

CCCTGGCCCT GCTGGTGCTG CTGGCCCTGC TGGA^ACCCT GGTGCTGATG GACAGCCTGG 

730 740 750 760 770 780 

TGCTAAAGGT CCCAATGGTG CTCCTGGTAT TGCT03TGCT CCTGGCTTCC CTGGTGCCCG 

790 800 810 820 830 840 

AGGCCCCTCT GGACCCCAGG GCCCCGGCGG CCCTCCTGGT CCCAAGGGTA ACAGCGGTGA 

850 860 870 880 890 900 

ACCTGGTGCT CCTGGCAGCA AAGGAGACAC TGCTGCTAAG GGAGAGCCTG GCCCTGTTGG 

910 920 930 940 950 960 

TGTTCAAGGLV CCCCCTGGCC CTGCTGGAG* GGAAG'IAAAG CGAGGAGCTC GAGGTGAACC 

970 '980 990 1000 1010 1020 

CGGACCCACT GGCCTGCCCG GACCCCCTGG CGAGC'STGGT GGACCTGGTA GCCGTGGTTT 

1030 1040 1050 1060 1070 1080 

CCCTGGCGCA GATGGTGTTG CTGGTCCCAA CCCTOXGCT CCTGAACGTG GTTCTCCTGG 

1090 1100 1110 1120 1130 1140 

CCCCGCTGGC CCCAAAGGAT CTCCTGGTGA AGCTC'STCGT CCCGGTGAAC CTGGTCTGCC 



1150 1160 1170 

TGCTGCCAAG GGTCTGACTG GAAGCCCTGG 

1210 1220 1230 

CCCTCGTCCC CCCCGTCAAG ATCGTCGCCC 



1180 1190 1200 

CAGCCOTGGT CCTGATGGCA AAACTGGCCC 

1240 1250 1260 

CQCACOCCCA GGCCCACCTG CTGCCCCTGG 

FIG. I4A 
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12-70 1280 1290 1300 • 1310 1320 

TCAGGCTGGT GTGATGGGAT TCCCTGGACC TAAAGGTGCT GCTGGAGAGC CCGGCAAGGC 

1330 1340 1350 1360 1390 1380 

TGGAGAGCGA GGTCTTCCCG GACCCCCTGG CGCTCTCGGT CCTGCTGGCA AAGATGGAGA 

1390 1400 1410 1420 1430 1440 

GOCTGGAGCT CAGCGACCCC CTGGCCCTGC TGGTCOCGCT GGCGAGAGAG GTGAACAAGG 

1450 1460 1470 1480 1490 1500 

CCCTCCTGGC TCCCCCGGAT TCCAGGGTCT CCCTCiTTCCT GCTGGTCCTC CAGGTGAAGC 

1510 1520 1530 1540 1550 1560 

AGGCAAACCT GCTGAACAGG GTGTTCCTGG AGACCTTGGC CCCCCTGGCC CCTCTGGACC 

1570 1580 1590 1600 1610 1620 

AACAGGCCAC AGAGGTTTCC CTGGCGAGCG TGGTCTGCAA GCTCCCCCTG GTCCTGCTGG 

1630 1640 1650 1660 1670 1680 

ACCCCGAGGG GCCAACGGTG CTCCCGGCAA CGATGGTGCT AAGGGTGATG CTGGTGCCCC 

1690 ' 1700 1710 1720 1730 1740 

TGGAGCTCCC GGTAGCCAGG GCGCCCCTGG CCTTCXGGGA ATGCCTGGTG AACGTGGTGC 

1750 1760 1770 1780 1790 1800 

AGCTGGTCTT CCAGGGCCTA AGGGTGACAG AGGTGATGCT GGTCCCAAAG GTGCTGATGG 

1810 1820 1830 1840 1850 1860 

CTCTCCTGGC AAAGATGGCG TCCGTGCTCT GACCG^CCCC ATTGGTCCTC CTGGCCCTGC 

1870 1880 1890 1200 1910 1920 

TCGTGCCCCT GGTGACAAGG CTGAAAGTGG TCCCA3CGGC CCTGCTGGTC CCACTGGAGC 

1930 1940 1950 1960 1970 1980 

TCGTGGTGCC CCCGGAGACC GTGGTGAGCC TGGTCCCCCC GGCCCTGCTG GCTTTGCTGG 

1990 2000 2010 2020 2030 2040 

CCCCCCTGGT GCTGACGGCC AACCTGGTGC TAAAGGCGAA CCTGGTGATG CTGGTGCCAA 

2050 2060 2070 2080 2090 2100 ' 

AGGCGATGCT GGTCCCCCTG GGCCTGCCGG ACCCGCTGGA CCCCCTGGCC CCATTGGTAA 

2110 2120 2130 2140 2150 2160 

TGTTGGTGCT CCTGGAGCCA AAGGTGCTCG CGGCAGCGCT GGTCCCCCTG GTGCTACTGG 

2170 2180 2190 2200 2210 2220 

TTTCCCTGG7 GCTGCTGGCC GAGTCGGTCC TCCTGSCCCC TCTGGAAATG CTGGACCCCC 

2230 2240 2250 .. 2260* 2270 2280 

TGCCCCTCCT GGTCCTGCTG GCAAAGAAGG CGGCAAAGGT CCCCGTGGTG AGACTGGCCC 

2290 2300 2310 2320 2330 2340 

TGCTGGACGT CCTGGTGAAG TTGGTCCCCC TGGTCCCCCT GGCCCTGCTG GCGAGAAAGG 

2350 2360 2370 2380 2390 2400 

ATCCCCTGGT GCTGATGGTC CTGCTCCTGC TCCTGGTACT CCCGGGCCTC AAGGTATTGC 

2410 2420 2430 2440 2450 2460 

TGGACAGCGT GGTGTGGTCG GCCTGCCTGG TCAGAGAGGA GAGAGAGGCT TCCCTGCTCT 

2470 2480 2490 2500 2510 2520 

TCCTGGCCCC TCTGGTCAAC CTGGCAAACA AGCTCCCTCT GGAGCAAGTG GTCAACCTGG 
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2530 2540 2550 2560 • 2570 2580 

TCCCCCCGGT CCCATGGGCC CCCCTCGATT GGCTGGACCC CCTGCTGAAT CTGGACGTGA 

2590 2600 2610 2620 2630 2640 

GGGGGCTCCT GCTGCCGAAG GTTCCCCTGG ACGAGACGGT TCTCCTGGCG CCAAGGGTCA 

2650 2660 2670 2680 2690 2700 

CCGTGGTGAG ACCGGCCCCG CTGGACCCCC TGCTGCTOJT GCTGCTOJTG GTGCCCCTGG 

2710 2720 2730 2740 2750 2760 

CCCCGTTGGC CCTGCTGGCA AGAGTGGTGA TCGTGGTGAG ACTGGTCCTG CTGGTCCCGC 

2770 2780 2790 2800 2810 2820 

CGGTCCCGTC GGCCCCGCTG GCGCCCGTGG CCCCCCCGGA CCCCAAGGCC CCCGTGGTGA 

2830 ' 2840 2850 2860 2870 2880 

CAAGGCTGAG ACAGGCGAAC AGGGCGACAG AGGCATAAAG GGTCACCGTG GCTTCTCTGG 

2890 2900 2910 2920 2930 2940 

CCTCCAGGCT CCCCCTGGCC CTCCTGGCTC TCCTGGTGAA CAAGGTCCCT CTGCAGCCTC 

2950 2960 2970 2980 2990 3000 

TGGTCCTGCT GGTCCCCGAG GTCCCCCTGG CTCTGCTGGT GCTCCTGGCA AAGATGGACT 

3010 3020 3030 3040 3050 3060 

CAACGGTCTC CCTGGCCCCA rTGGGCCCCC TGGTCCTCGC GGTCGCACTC GTGATGCTGC 

3070 3080 3090 3100 3110 3120 

TCCTCTTGGT CCCCCCGGCC CTCCTGGACC TCCTGGTCCC CCTGGTCCTC CCAGCGCTGG 

3130 3140 .3150 3160 3170 3180 

TTTCGACTTC AGCTTCCTCC CCCAGCCACC TCAAGAGAAG GCTCACGATG CTGGCCGCTA 

3190 3200 3210 3220 3230 3240 

CTACCGGGCT agatccCAGC GGGCCAGGAA GAAGAATAAG AACTGCCGGC GCCACTCGCT 

3250 3260 3270 3280 3290 3300 

CTATGTGGAC TTCAGCGATC TGGGCTGGAA TGACTGGATT GTGGCCCCAC CAGGCTACCA 

3310 3320 3330 3340 3350 3360 * 

GGCCTTCTAC TGCCkTGGGG ACTGCCCCTT TCCAC1X5GCT GACCACCTCA ACTGAACCAA 

3370 3380 3390 3400 3410 3420 

CCATGCCATT CTGCAGACCC TGGTCAATTC TGTCAATTCC AGTATCCCCA AAGCCTGTTG 

3430 3440 3450 3460 3470 3480 

TCTGCCCACT GAACTGAGTG CCATCTCCAT GCTGTACCTG GATGAGTATG ATAAGCTGCT 

3490 3500 .3510 3520 3530- 3540 
ACTGAAAAAT TATCAGGAGA TCGTAGTAGA GGGhTGTGGG TGCCGCTAAa agctt 
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in 20 30 40 50 60 

QLSYGYDEXS TGGISVPGPM GPSGPRCLPC PFGAPGPQGF QGPPGEPGEP GASGFWGPRG 

n n BO 90 100 110 120 

PPGPWKNGD DCnCKFGR? GERGPPCPQG AKGLPGTAGL PCMKGHRGFS CLDGAXGCttG 

i-in 140 150 160 170 180 

PACPXGEPCS PGENGAPGQM CPRCLFGERC RPGAPGPAGA RGNDGATGAA GPPGPTGPAG 

ion 200 210 220 230 240 

PPGFPGAVGA KGEACPQGPR GSDGWVRC EPGPPGPAGA AGPAGNPGAD GOFGAXGANG 

550 260 270 280 290 300 

APGIAGAPGF PGARCPSGPQ GPGGPPGPKG WSGEPGAPG5 XCOTGAXCEP GPVGVQGPPG 

310 320 330 340 350 360 

PACEEGXRGA RGEPGPTGLP CPFGERGGPC TRGF PGADGV AGPXCPAGER GSPGPAGPXG 

y7 0 380 390 400 410 420 

SPGEACRPGE AGI/PGAXGOT GSPGSPGPDG KTGPPGPAGQ DGRFGPFCPP GARGCACVWG 

430 440 450 460 470 480 

FPGPXGAAGE PGXAGERGVP GPPGAVGPAG KDGEAGAQGP PG PAG PAGER GSQGPAGSPG 

490 500 510 520 530 540 

FQGLPGPAG? PGEAGKPGEQ CVPGDUGAPG PSCARCERGF PGERCVQGPP GPAGPRGANG 

550 560 570 580 590 600 

AFGNDGAKGD AGAPCAFGSQ GAPGLQGMPC ERGAAGLPGP KGDRGDAGPX GADGSPGXDG 

610 620 • 630 640 650 660 

VRGLTGPIGP PGPAGAPGDX GESGPSGPAG PTCARGAPGD RGEPGPPGPA GFAGPPGATC 

670 680 690 700 710 720 

QPGAXGEPGD AGAXGDAGPP G PAG PAG PPG PIGNVGAPGA XGARGSAGP? GATGFPGAAG 

730 740 750 760 770 780 

RVGPPGPSGN AGPFGPPGPA GKEGGKGPRG ETGPAGRPGE VG PPG PPG PA GEKGSPGADG 

790 800 810 820 830 840 

PAGAPGTPGP QGXAGORGVV GLPGQRGERG FPGLFGPSGS PGXQGPSGAS GERGPPGFMG 

850 860 870 880 890 900 

PPGLAGPPCE SGRBGAPAAE GSPGRDGSPG AXGDRGETG? AGPPGAXGAX GAPGPVGPAG 

910 920 930 940 950 960 

X5GDRGETGF AGPAGPVGPA GARGPAGPQG PRGDXGCTGE QGDRGIXGKR GFSGLQGPPG 

970 980 990 1000. 1010 1020 

PPGSPGEQCP SGASGPAGPR G PPG SAGA PC KDGEKCLPGP IGPPGPRGRT GDAGPVGPPG 

1030 1040 1050 1060 1070 1080 

PFGPPCPPGP PSACFDFSFL PQPPQEKAHD GGRYYRARSA LOIWCFSST EXNCCVRQLY 

1090 1100 1110 1120 1130 1140 

IDFRKDLGVK WIHEPXGYHA NFC LG PC PY I WSLD7QYSKV LALYNQKNPC ASAAPCCVPQ 

1150 1160 1170 1180 1190 1200 
ALEPLPrVYY VGRXPKVEQL SNMIVRSCXC S' 
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10 20 30 40 > 50 60 

gggaaggatt CCCatttCCC AGCTGTCTTA TGGCTATGAT GAGAAATCAA CCGGAGGAAT 

70 80 90 100 HO 120 

TTCCGTGCCT GGCCCCATCG GTCCCTCTGC TCCTCGTGCT CTCCCTGCCC CCCCTGGTGC 

120 140 150 160 170 180 

ACCTGGTCCC CAAGGCTTCC AAGCTCCCCC TGCTCIiGCCT GGCGAGCCTG GAGCTTCAGG 

190 200 210 ' 220 230 240 

TCCCATCGGT CCCCGAGGTC CCCCAGCTCC CCCTG:3AAAG AATGGAGATG ATGGGGAAGC 

250 260 270 280 290 300 

TGGAAAACCT GGTCGTCCTG GTGACCCTGG GCCTCCTGGG CCTCAGGGTG CTCGAGGATT 

3X0 . 320 330 340 350 360 

GCCCGGAACA GCTGGCCTCC CTGGAATGAA GGGACtCAGA GGTTTCAGTG GTTTCGATGG 

370 380 390 400 410 420 

TCCCAAGGGA GATGCTGGTC CTGCTGCTCC TAAGGC/TGAG CCTGGCAGCC CTGGTGAAAA 

430 440 450 460 470 4B0 

TGGAGCTCCT GGTCAGATGG GCCCCCCTCG CCTGCCTGGT GAGAGAGGTC GCCCTGGAGC 

490 500 510 520 530 540 

CCCTCGCCCT GCTGGTGCTC GTGGAAATGA TCGTGCTACT GGTGCTGCCC GCCCCCCTGG 

550 560 ' 570 580 590 600 

TCCCACCGGC CCCGCTGGTC CTCCTGGCTT CCCTOISTGCT GTTGGTGCTA AGGGTGAAGC 

610 620 630 640 650 660 

TGGTCCCCAA GGGCCCCGJxQ GCTCTGAAGG TCCCCAGGGT GTGCGTGGTG AGCCTGGCCC 

670 680 690 700 710 720 

CCCTCGCCCT GCTGCTGCTG CTGGCCCTGC TGGAAACCCT GGTGCTGATG GACAGCCTGG 

730 740 750 760 770 780 

TGCTAAAGGT GCCAATGGTG CTCCTGGTAT TCCTCGTGCT CCTGGCTTCC CTGCTGCCCG 

790 800 810 820 830 840 • 

AGGCCCCTCT GGACCCCAGG GCCCCGGCGG CCCTCCTGGT CCCAAGGGTA ACAGCGGTGA 

850 660 870 880 890 900 

ACCTGGTGCT CCTGGCAGCA AAGGAGACAC TGGTGCTAAG GGAGAGCCTG GCCCTGTTGG 

910 920 930 940 950 960 

TCTTCAAGGA CCCCCTCGCC CTGCTGGAGA GGAAGGAAAG CGAGGAGCTC GAGGTGAACC 

970 .980' 990 1000. 1010 1020 

CGGACCCACT GGCCTGCCCG CACCCCCTGG CGAGCGTGCT GGACCTCGTA GCCGTGGTTT 

1030 1040 1050 1060 1070 1080 

CCCTGGCGCA GATGGTGTTC CTGGTCCCAA GGGTCCCGCT CGTGAACGTG GTTCTCCTGG 

1090 1100 1110 1120 1130 1140 

CCCCGCTGGC CCCAAAGGAT CTCCTGGTGA ACCTGGTCGT CCCGGTGAAC CTGGTCTGCC 

1150 1160 1170 1180 1190 1200 

TGCTGCCAAG GGTCTCACTG GAACCCCTGC CAGCCCTCCT CCTGATGGCA AAA CTGGCCC 

1210 1220 1230 1240 1250 1260 

CCCTGGTCCC GCCGGTCAAG ATGGTCGCCC CGGACCCCCA GGCCCACCTG GTGCCCGTGG 
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1270 1280 1290 1300 . 1310 1320 

TCAGGCTGGT GTGATGGGAT TCCCTGGACC TAAAGGTGCT GCTGGAGAGC CCCGCAAGCC 

1330 1340 1350 1360 1370 1380 

TGGAGAGCGA GGTGTTCCCG GACCCCCTCG COCTCTCGGT CCTCCTCGCA AACATGGACA 

1390 1400 1410 1420 1430 1440 

GGCTGGAGCT CAGGGACCCC CTGGCCCTGC TGCTCCCGCT CGCCACACAG CTCAACAAGC 

1450 1460 1470 1480 1490 1500 

CCCTGCTCCC TCCCCCGGAT TCCAGGGTCT CCCTCGTCCT GCTGGTCCTC CAGGTGAAGC 

1510 1520 1530 1S40 1550 1560 

AGGCAAACCT GGTGAACAGG GTGTTCCTCG AGACCTTGCC GCCCCTGCCC CCTCTGGAGC 

1570 , 1580 1590 1600 1610 1620 

AAGAGGCGAG AGAGGTTTCC CTGGCGAGCG TGGTGTGCAA GGTCCCCCTG GTCCTGCTGG 

1630 1640 1650 1660 1670 1680 

ACCCCGAGGG GCCAACGGTG CTCCCGGCAA CGATGGTGCT AAGGGTGATG CTGGTGCCCC 

1690 1700 1710 1720 1730 1740 

TGGAGCTCCC GGTAGCCACG GCGCCCCTGG CCTTCAGGCA ATGCCTGGTG AACGTGGTGC 

1750 1760 1770 1780 1790 1800 

AGCTGGTCTT CCAGGGCCTA AGGGTCACAG AGGTGATGCT CGTCCCAAAG GTGCTGATGG 

1810 1820 1830 1840 1850 1860 

CTCTCCTGGC AAAGATGGCG TCCGTGGTCT GAOXKJCCCC ATTGGTCCTC CTGGCCCTGC 

1870 1880' 1890 1900 1910 1920 

TCGTGCCCCT GGTGACAAGG GTGAAAGTGG TCCCACJCGGC CCTGCTGGTC CCACTGGAGC 

1930 1940 1950 1960 1970 1980 

TCGTGGTGCC CCCGGAGACC GTGGTGAGCC TGGTCCCCCC GGCCCTGCTG GCTTTGCTGG 

1990 2000 2010 2020 2030 2040 

CCCCCCTGGT GCTGACGGCC AACCTGGTGC TAMGGCGAA CCTGGTGATG CTGGTGCCAA 

2050 2060 2070 2080 2090 2100 . 

AGGCGATGCT GGTCCCCCTG GGCCTGCCGG ACCCGCTGGA CCCCCTGGCC CCATTGGTAA 

2110 2120 2130 2140 2150 2160 

TGTTGGTGCT CCTGGAGCCA AAGGTGCTCG CGGCA-IJCGCT GGTCCCCCTG GTGCTACTGG 

2170 2180 2190 2200 2210 2220 

TTTCCCTGGT GCTGCTGGCC GAGTCGGTCC TCCTG3CCCC TCTGGAAATG CTGGACCCCC 

2230 2240 2250 • 2260 2270 2280 

TGGCCCTCCT GGTCCTGCTG .GCAAAGAACG CGGCAAAGGT CCCCGTGGTG AGACTGGCCC 

2290 2300 2310 2320 2330 2340 

TGCTGGACGT CCTGGTGAAG TTGGTCCCCC TGCTCCCCCT GGCCCTGCTG GCGAGAAAGG 

2350 2360 2370 2380 2390 2400 

ATCCCCTGGT GCTGATGGTC CTGCTGGTGC TCCTGGTACT CCCGGGCCTC AAGGTATTGC 

2410 2420 2430 2440 2450 2460 

TC^ACACCCT GGTGTGGTCG GCCTCCCTGC TCAGA-SAGGA GACAGAGGCT TCCCTGGTCT 

2470 2480 2490 2500 2510 2520 

TCCTGGCCCC TCTGCTGAAC CTGGCAAACA AGGTCCCTCT GGACCAAGTG GTCAACGTGG 
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■«COCcSS CCCAwSS CCCCltSS OOCTOcSS CCTGGtSS CTGGACCTCA 

caaao£& octcccSSS cnrccc^ ac^gacS? icxcciSS ccaagg^™ 

3660 2670 2680 2690 2700 

CCGTOG^ ACCGGCCCcS CTCGACCCCC TCCTCCTCCT GGTCCTCTTC GTOCCCCTGG 

9710 2720 2730 2740 2750 2760 

OCCCCtSgC CCTCCTCGCA AGAGTGGTCA TCGTGGTGAG ACTGGTCCTG CTCGTCCCCC 

, 7 -, Q 2780 2790 2800 2810 2820 

CGGTCCCGTC GGCCCCCCPG CCGCCCGTGG CCCCGCCGGA CCCCAAGCCC CCCGTGGTGA 

2830 ' 2840 2850 2860 2870 2880 

CAACGGTGAG ACAGGCGAAC AGGGCGACAG AGGCA?AAAG GGTCACCGTG GCTTCTCTCG 

2890 2900 2910 2920 2930 2940 

CCTCCAGGGT CCCCCTCGCC CTCCTCGCTC TCCTGGTGAA CAAGGTCCCT CTGGAGCCTC 

2950 • 2960 2970 2980 2990 3000 

TGGTCCTGCT GGTCCCCGAG GTCCCCCTGG CTCTGCTCGT GCTCCTGGCA AAGATGGACT 

3010 3020 3030 3040 305O 3060 

CAACGGTCTC CCTGGCCCCA TTGGGCCCCC TGGTCCTCGC GGTCGCACTG GTGATGCTGG 

3070 3080 3090 3100 3110 3120 

TCCTGTTGGT CCCCCCGCCC CTCCTGGACC TCCTG^TCCC CCTGGTCCTC CCAGCGCTGG 

3130 3140 3150 3160 3170 3180 

TTTCGACTTC AGCTTCCTCC CCCAGCCACC TCAAG/.GAAG GCTCACGATG GTGGCCGCTA 

3190 3200 3210 3220 3230 3240 

CTACCGGGCT agatctGCCC TGGACACCAA CTATTCiCTTC AGCTCCACGG AGAAGAACTG 

3250 3260 3270 3280 3290 3300 

CTGCGTGCGG CAGCTGTACA TTGACTTCCG CAAGGACCTC GGCTGGAAGT GGATCCACGA 

3310 3320 3330 3340 3350 3360 - 

CCCCAAGGGC TACCATGCCA ACTTCTGCCT CGGGCCCTGC CCCTACATTT <5GAGCCTGGA 

3370 3380 3390 3400 3410 3420 

CACGCAGTAC AGCAAGGTCC TGGCCCTGTA CAACC/.GCAT AACCCGGGCG CCTCGGCGGC 

3430 3440 3450 3460 3470 3480 

GCCGTGCTGC GTGCCGCAGG CCCTGGAGCC GCTGCCCATC GTGTACTACG TCGGCCGCAA 

3490 3500 3510 3520 "' 3530 3540 

GCCCAAGGTG GACCAGCTGT CCAACATGAT CGTCCGCTCC TGCAAGTCCA CCTCAtctag 

3550 3560 3570 3580 3590 3600 
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10 20 30 40 . 50 60 

OLSVGYDEXS TGCISVPCPM CPSGPRCLPG PPGAPGPQGF QGPPCEPCEP GASGPHGPRG 

70 80 90 100 HO 120 

PPGPPGKNGD DCEAGKPGRP GERCPPGPQG ARCLFCTAGL IfcKKCKRGFS GLDGAXGDAG 

130 1<0 150 160 170 180 

PACPXGEPCS FCENGAPGQU GPRCLPGERG RPGAPGPAGA RGHDGATGAA GPPCPTCPAC 

190 200 210 220 230 240 

PPGr PGAVGA KGEAGPQCPR GSEGPQGVKG EPGPFCPAGA AGPAGNPGAD GQPGAXGANG 

250 260 270 280 290 300 

AFC1ACAPGF PGAKGPSGPQ GPGGPPGFKG WSGEPCAPGS KCDTGAXCEP GPVGVCGPPG 

310 " 320 330 340 350 360 

PAGEEGXRGA RGEPGP7CLP GPPGERGGPG SRGFFCADGV AG PKC PAGER GSFCPAGPXG 

370 380 390 400 410 420 

SPCEAGRPCE AGLPGAXGLT GSPCSPGPDG XTGPPCPAGQ DGRPGPPCPP GAKGQAGVMG 

430 440 450 460 470 480 

FPGPKGAAGE PGKAGERGVP GPPGAVGPAG KDGEAGAQG? PC PAG PAGER GEQGPAGSPG 

490 500 510 520 530 540 

FQGLPGPAGP PGEAGXPCEQ GVPCDLGAPG PSGARGERGF ^GERGVQGPP G PAG PRC AN G 

550 560 570 580 590 600 

AFOTDGAXGD AGAPGAPCSQ GAPGLQGMPG ERGAAGLPGP XGDRGOAGPX GADGSPCXDG 

610 620 630 640 650 660 

VRGL/TCPIGP FCPAGAPCOX GESGPSGPAG FTGARSAPGD RGEPGPPGPA GFAGPFGADG 

C70 680 690 700 710 720 

QPGAXGEPGD AGAXGDAGPP G PAG PAG PPG PIGKV3APCA KGARGSAGPP GATGF PGAAG 

730 740 750 760 770 780 

RVGPPGPSGM AG PPG PPG PA GKEGGXGPRG ETCPAGRFGS VGPPGPPGPA GEXCSPGADC 

790 800 810 820 830 840 

PAGAPCTPGP QGIAGQRCVV GL?GQHGmG KPGLPGPSG2 PGKQGPSGAS GERCPPGPMG 

850 660 870 880 890 500 

PPGLAGPPGE SGREGAPAAE CSPCRDCSPG AXGDRCETG? AGPPGAXGAX GAPGPVGPAG 

910 920. 930 940 950 960 

KSGDRGETGP ACPAGPVGPA GARGPAGPQG PRGDKCE7GE QGDRGIKGhT* GFSGLQGPPG 

■970 930 990 1000 1010 1020 

PPGSPGEQGP SGASG PAG PR G PPG SAGA PG XDGLKCLPCP IGPPGPKGRT GDAGPVGPPG 

1030 1040 1050 1060 1070 1080 

? PGP PGP PGP PSACFDFSFL PQPPQEXAKD GGRYVRARSO EASGIGPEVP DDRDFEPSLG 

1090 1100 1110 1120 1130 1140 

PVCPtHCQCH LRWQCSDLG LDXVPXDLP? D7TLLDLQWN* XITEIKDCDF KNLXNLKALI 

1150 1160 1170 1180 1190 1200 

L\0OTKI5KVS PGAFTPJLVKL ERLYLSKNQL K£LPmi?KT LQSLRAHENS ITKVRKVTFN 

1210 1220 1230 1240 1250 1260 

CLKQMIVIEL GTNPLXSSCI ENCAFQCUXX LSYIR1ADTN ITSIPQGLPP SLTEOHLDGN 
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1270 1280 1290 1300 . 1310 1320 

KISRVUAASL KCLNNUOOC LSFNSISAVD NCSLWTPHL RELHLCNWXL TRVPGGLA£H 

f 

1330 1340 1350 1360 1370 1380 

XYIQWYLHN NNISWGSSD FCPPGHNTKK ASYSGVSLFS NPVQYWEIQP STFKCWVRS 

1390 1400 1410 1420 1430 1440 
AIQLGNYX* 



FIG. I7B 



175 



EP 0 992 586 A2 



10 20 30 40 . 50 60 

QLSYCYDEKS TCGISVPCPM CPSCPRGLPC PPGAPCPQCF QGPPCEPGEP CASGPMCPRG 

70 80 90 100 - 110 120 

PPCPPCXNGD DGEAGKPGRP GERGPPGPQG ARCLPCTAGL PCMXCKRCFS GLDGAXGDAG 

130 140 150 160 170 180 

PAGPXGEPGS PGENGAPGQM GFRGLPGERG JUXSAPGPAGA KGNDGATGAA GPPGPTGPAG 

190 200 210 220 230 240 

PPG FPGA VGA KGEAGPQGPR CSECPQGVRG EPCPPGPAGA AGPAGNPGAD GQPGAKGANG 

250 260 270 280 290 300 

APGIAGAPGF PCARCPSCPQ GPGGPPGPXG NSGEPGAPCS KGOTGAXGEP CFVCVQCPFG 

310 320 330 340 350 360 

PAGE5GKRGA RGEPGFTCLP GPPGERGGPG SRGFPGADGV AGPKGPAGER GSPGPAGPXG 

370 380 390 400 410 420 

SPGEAGRPGS AGLPGAXGLT GSPGSPGPDG KTGPPGPAGQ DGRPGPPGPP GARGQAGVMG 

430 440 450 460 470 480 

FPGPXGAAGE PGKAGERGV? GPPGAVGPAG XDG2AGAQGP ?G PAG PAGER GEQGPAGSPG 

490 500 510 520 530 540 

FQGLPGPAGP FGEAGXPGEQ GVPGDLGAPG PSGARGERGF PGERGVQGP? GPAGPKGANG 

550 560 570 580 590 600 

APGNDGAXGD AGAPGAPGSQ GAPGLQCKPG. ERGAAGLPGP KGORGDAGPK GADGSFGXDG 

610 620 630 640 650 660 

VRGLTGPICP PGPAGAPGDX GESGP5GPAG FTGARGAPCD RGEPGPPGPA GFACPFGATO 

670 680 690 700 710 720 

QPGAXGEPGD AGAXGDAGPP GPAGPAGPPG PIGNVGAPCA KGARGSAGPP GATCFPGAAG 

730 740 750 760 770 780 

RVGFFGFSGN AGPPGPPGPA CXEGGXGPRG ETCPAGRFGE VGPPGPPGPA GEXGSFGADG 

790 800 810 820 830 840 ' 

PAGAPGTPCP QGIAGQRGW GLPGQRGERG FPGLPGPSGE PGKQGPSGAS GERGPPGPMG 

850 860 870 880 890 900 

PPG LAG PPG E SGREGAPAAB GSPGRDGSPG AXGDRGETOP AGPPGAXGAX GAPGPVGPAG 

910 920 930 • 940 950 960 

XSGDRGETGP AGPAGPVGPA GARGPAGPQG PRGDKGETGE QCDRGIKGKR GFSGLQGPPG 

970 960 990 1000 1010 1020 

PPGSPGEOGP SGASGPAGPR GPPGSAGAPG XDGLNGLPGP IGPPGPRGRT GDACPVGPPG 

1030 1040 1050 106.0 1070 1080 

PPGPPGPPGP PSACFDFSFL PQPPQEXAHD GCRYYRARSjP KDLPPDTTLL DLQ!*NXITEI 

1090 1100 1110 1120 1130 1140 
KDGDFXWLKN LHALIDVNMK ISKVSPG* 
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CAG CTC TC? TAT OGC GAT GAO AAA TCA ACC CGA GGA ATT TCC CTG CCT GGC CCC ATG 

7 - B7 96 105 114 

GCT CCC tS GOT CCT CGT GGT CTC CCT GGC CCC CCT GGT GCA CCT GGT CCC CAA GGC TTC 

12 g 133 147 ' 156 165 174 

GGT CCC CCr GGT GAG CCT GGC GAG CCT GGA GCT TCA GGT CCC ATG GGT CCC CGA GGT 

216 225 234 



CAA 



133 
GAG 

lftg 198 207 216 

CCC CCA OCT CCC CCT GGA AAG MT GGA GAT GAT GGG GAA GCT GGA AAA CCT GGT CGT CCT 

249 253 267 276 235 294 

GGT GAG CGT GGG CCT CCC GOG CCT CAG GGT GCT CGA GGA TTG CCC GGA ACA GCT GGC CTC 

327 336 

CCT GGA AW AAG GGA CAC 

373 387 396 

CCT GCT CGT CCT 

438 447 456 465 474 

OGC CTG CCT GGT GAG AGA GGT CGC CCT GGA GCC CCT GGC CCT GCT GGT GCT 

534 



303 3)8 >t ' jj" 

AGA GGT TTC AGT GGT TTG GAT GGT GCC AAG GGA GAT GCT GGT 



345 
GCC 

405 



354 
GAT 

414 



369 .wo jo' jjo 

A\G GGT GAG CCT CGC AGO CCT GGT GAA AAT GGA GCT CCT GGT CAG ATG 

429 438 447 456 



GGC CCC CGT 



516 



525 



489 438 507 j< ^ 

CGT GGA AAT GAT GGT GCT ACT GG? GCT GCC GGG CCC CCT GGT CCC ACC GGC CCC GCT GGT 



576 



585 



594 



549 558 567 JUJ J;7 , 

CCT CCT OX TTC CCT GGT GCT GTT GGT GCT AAG GGT GAA GCT GGT CCC CAA GGG CCC CGA 

609 618 627 636 645 654 

GGC TCT GAA GGT CCC CAG GGT GTG CCT GGT GAG CCT GGC CCC CCT GGC CCT GCT GGT GCT 

669 678 687 696 705 714 

GCT GGC CCT GCT GCA AAC CCT GGT GCT GAT GGA CAG CCT GGT GCT AAA GGT GCC AAT GGT 

729 735 747 756 765 774 

GCT CCT GGT ATT GCT CGI* GCT CCT CGC TTC CCT GGT GCC CGA GGC CCC TCT GGA CCC CAG 

789 793 S07 816 S25 834 

GGC CCC GGC GGC CCT CCT GGT CCC AAG GGT AAC AGC GGT GAA CCT GGT GCT CCT CGC AGC 

*4? S58 867 876 885 894 

AAA GGA GAC ACT GGT CCT AAG GGA GAG CCT GGC CCT GTT GGT GTT CAA GGA CCC CCT GGC 

90S 913 927 936 945 954 

CCT GCT GGA GAG CAA GGA AAG CGA GGA GCT CGA GGT GAA CCC GGA CCC ACT GGC CTG CCC 

$69 978 * 937 996 1005 1014 

GGA CCC CCT GGC GAG CGT GGT CGA CCT GGT AGC CGT GGT TTC CCT GGC GCA GAT GGT GTT 

1029 1038 1047 1056 1065 1074 

uct ccr ccc mc cot ccc gct got gaa cgt ccr tct cct ggc ccc got ggc ccc aaa gga 

10S9 1098 1107 U16 1125 1134 

*CT o:r OCT CAh GCT OCT CGT CCC GCT GAA GCT GGT CTG CCT GGT GCC AAG~GGT CTG ACT 

1149 1158 1167 1176 11B5 . 

GCA AGC CCT GGC AOC CCT GGT CCT GAT GGC AAA ACT GGC CCC CCT GGT CCC GCC GGT CAA 

r 1?09 1218 1227 i 2 3 6 1245 

G.V. a,. CGC CCC CGA C CC CCA GGC CCA CCT GGT GCC CGT GGT CAG GCT* GGT GTG ATG GGA 
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126? 12*7R 1257 1296 3 .105 B14 

■T'i>: ccr go-a rev cgt ccr gctt coa" gag ccc ogc aag ccr cv.a gag cga ggt gtt ccc 

1329 133$ 1347 1355 1365 1374 

GGA CCC CCT CCC CCT GrC GOT CCT CCT GCC AAA GAT GGA GAG GCT GGA GCT CAG GGA CCC 

1339 1393 14C7 1416 142S 1434 

CCT GGC CCT GCT GGT CCC ' GCT GGC GAG AGA GGT GAA CAA GGC CCT GCT GGC TCC CCC GGA 

1449 1459 1467 3.476 1485 1494 

TTC CAG GGT CTC CCT GCT CCT GCT GGT CCT CCA GGT GAA GCA GGC AAA CCT GGT GAA CAG 

1509 1518 1527 1536 1545 1554 

GGT GTT CCT GGA GAC CTT GGC GCC CCT GGC CCC TCT GGA GCA AGA GGC GAG AGA GGT TTC 

1569 1S7S 1587 1556 1605 1614 

CCT GGC GAC CGT GGT GTG CAA CGT CCC CCT GGT CCT GCT GGA CCC CGA GGG GCC AAC GGT 

1629 1638 1647 1656 1665 1674 

GCT CCC GCC AAC GAT GCT GCT AAC CGT GAT GCT GGT GCC CCT GGA GCT CCC GGT AGC CAG 

1689 1698 1707 1716 1725 1734 

GGC GCC CCr GGC CTT CAG GGA ATG CCT GGT GAA CGT GGT GCA GCT GGT CTT CCA GGG CCT 

1749 ,1758 1767 1776 1785 1794 

AAG GGT GAC AGA GGT GAT GCT GGT CCC AAA GGT GCT GAT GGC TCT CCT GGC AAA GAT GGC 

1309 1818 1827 1836 1845 ' 1854 

GTC CGT GGT CTC ACC CGC CCC ATT GGT CCT CCT GGC CCT GCT GGT GCC CCT GGT GAC AAG 

1359 1378 1887 1896 1905 1914 

GOT GAA ACT GGT CCC AGC GGC CCT GCT GGT CCC ACT GGA GCT CGT GGT GCC CCC GGA GAC 

^ 1929 1538 1947 1956 1965 1974 

CGT GGT GAG CCT GGT CCC CCC GGC CCT GCT GGC TTT GCT GGC CCC CCT GGT GCT GAC GGC 

1989 1998 2007 2016 2025 2034 

OA COT CGT GCT AAA GGC GAA CCT GGT GAT CCT GGT GCC AAA GGC GAT GCT GGT CCC CCT 

2049 2058 2067 2075 20S5 ?nQd 

CCK5 CCT GCC GGA CCC GCT CGA CCC CCT GGC CCC ATT GGT AAT GTT GGT GCP CCT GGA GCC 

2109 2US 2127 2136 2145 

° yt COf ° : - C CG - «C CCP GGT CCC CCr GCT OCT ACT GGT TTC CCT GGT GCT GCT GGC 

2169 2178 2137 2196 o^a 

CGA GTC GGT CCT CCF CGC CCC TCT GGA AAT GCT GGA CCC CCT GGC CCT CCT GGT CCT GCT 

2229 2239 * 2247 2256 2265 *>?1A 

C*C AM GA* GGC GGC AAA GGT CCC CGT GGT GAG ACT GGC CCT GCT GGA CCT CCT GGT GAA 

2289 229e 2307 2316 

err ccc ccr ot ccc ccr »c ccr ccr cgc «g aaa gga'tIc ccr got S gat gct 
ccr cct"o^ oct ccr «t act ccc'oS ccr caaot att cc/cS cac cgt'gIt gtg otc 

GSC CTC'W COT CAg'SI CCA GAs'SS CGC TTc'ot GGT CTt'cct' GGC CCC^ GGT GAA 
CCr COC a 5£ CAA «P 2 S TCT OGa'S AGT GGt'g" CGT «T 2 3S CCC GGT 2 CCC ATG GGC 
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2529 2S38 2547 25S6 i^oS 2V74 

CCC CCT GGA VIC CCT CCA CCC CCT GCT GAA TCT GCA CGT GAG GGG GCT CCT GCT GCC GAA 

2589 2598 2607 2616 2625 2634 

GGT TCC CCT GGA CCA GAC GGT TCT CCT GGC CCC AAG GGT GAC CGT GGT GAG ACC GGC CCC 

2649 265? 2667 2676 2685 ' 2694 

GCT GGA CCC CCT GGT Or? CCT GGT OCT CCT GGT GCC CCT GGC CCC GTT GGC CCT GCT GGC 

270'? 2713 2727 2736 2745 2754 

AAG ACT CGT CAT CGV GCC GAG ACT CGT CCT GCT* GGT CCC GCC GGT CCC GTC GGC CCC GCT 

2769 2778 27S7 2796 2805 2814 

GGC GCC CGT GGC CCC GCC GGA CCC CAA GGC CCC CGT GGT GAC AAG GGT GAG ACA GGC GAA 

2829 2838 2847 2356 2865 2874 

CAG GGC GAC AGA GGC ATA AAG GGT CAC CGT GGC TTC TCT GGC CTC CAG GGT CCC CCT GGC 

2889 2898 2907 2916 2925 2934 

CCT CCT GGC TCT CCT CGT GAA CAA CGT CCC TCT GGA GCC TCT GGT CCT GCT GGT CCC CGA 

2949 2953 2967 2976 2985 2994 

GGT CCC CCT GGC TCT GCT GGT GCT CCT GGC AAA GAT GGA CTC AAC GGT CTC CCT GGC CCC 

3009 3018 3027 3036 3045 3054 

ATT GGG CCC CCT CGT CCT CGC GGT CGC ACT GGT GAT GCT GGT CCT GTT GG? CCC CCC GGC 

3069 307S 3087 3096 3105 3114 

CCT CCT GGA CCT CCT GGT CCC CCT GGT CCT CCC AGC GCT GGT TTC GAC TTC AGO TTC CTC 

3129 3138 3147 3156 3165 3174 

CCC CAG CCA CCT CAA GAG K\G GCT CAC GAT GGT GGC CGC TAC TAC CGG GCT AGA TCC GAT 

3189 3193 3207 3216 322S 3234 

GAG GCT TCT GGG ATA GCC CCA CAA GTT CCT GAT GAC CGC GAC TTC GAG CCC TCC CTA GGC 

3249 32S3 32*7 3276 3285 3294 

CCA GTC TGC CCC TTC CGC TCT CAA TGC CAT CTT CGA GTG GTC CAG TCT TCT GAT TTG GGT 

3309 3318 3327 3336 3345 3354 

CTC GAC AAA GTC CCA AM GAT CTT CCC CCT GAC ACA ACT CTG CTA GAC CTG CAA AAC AAC 

3369 3378 3387 3396 3405 3414 

AAA ATA ACC GAA ATC AAA GAT GGA GAC TTT AAG AAC CTG AAG AAC CTT CAC GCA TTG ATT 

3433 3447 3456 3465 3474 

Cix GTC AAC AAT AAA ATP AGC AAA GTT AGT CCT GGA GCA TTT ACA CCT TTC GTG AAG TTG 

3439 345S - 3507 3516 3525 3534 

CUA CGA CTT TAT CTG TCC AAG AAT* CAG CTG AAG GAA TTG CCA GAA AAA ATG CCC AAA ACT 

r:Jr*t >553 3567 3576 3585 3594 

CiT CA, GAO CTG CGT GCC CAT GAC AAT GAG ATC ACC AAA GTG CGA AAA GTT ACT TTC AAT 

3609 3618 2627 3636 3645 3654 

GGA CTG AAC CAG ATG ATT GTC ATA GAA CTG GGC ACC AAT CCG CTG AAG AGC TCA GGA ATT 

3**9 3678 • 3637 3696 3705 3714 

<.AA AA? GGG CCT TTC CAG GGA ATG AAG AAG CTC TCC TAC ATC CGC ATT GCT GAT ACC AAT 

, rn ^P 373? 3747 3756 3765 3774 

AIL -CC AoC ATT CCT CA» GGT C'T CCT CCT TCC CTT ACC GAA TTA CAT CTT GAT GGC AAC 
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10 20 30. 40 . 50 60 

•Oggaaggatt tccattCccC AGCTGTCTTA TGGCTATGAT GACAAATCAA CCCGAGGAAT 

70 80 90 100 110 120 

TTCCGTGCCT GGCCCCATGG GTCCCTCTGG TCCTCGTGGT CTCCCTGGCC CCCCTGGTGC 

<\\r\ ISO 160 170 18° 

ACCWGlic? CAAGGCTTCC AAGGTCCCCC TGCTCAGCCT GGCGAGCCTG GAGCTTCAGG 

TCCCATGGGT CCCCCAG^ CCCCAGG^C CCCTGCAAAC AATGGAG^TC ATGGGGXAGC 

TGGAAAACCT GGTCGTCCTG GTGAGCGTGG GCCl^ CCTtAGG^G CTCGAGGATT 

GCCCGaJ^ GCTGGCC^CC £TGGAATGAA GGGACACAGA GGTTTCAGTG CTrTGGATGG 

370 380 390 400 410 420 

TGCCAAGGGA GA7X3CTGCTC CTCCTGCTCC TAAGGGTGAG CCTGGCAGCC CTGGTCAAAA 

430 440 450 460 470 480 

TGGAGCTCCT GGTCAGATGG GCCCCCCTGG CCTGCCTGGT GAGAGAGGTC GCCCTGGAGC 

490 500 510 520 530 540 

CCCTGGCCCT GCTGGTGCTC GTGGAAATGA TGGTGCTACT GGTGCTGCCG GGCCCCCTGG 

550 560 570 580 590 600 

TCCCACCGGC CCCGCTGGTC CTCCTGCCTT CCCTGGTGCT GTTGGTGCTA AGGGTGAAGC 

610 620 630 640 650 660 

TGGTCCCCAA GGGCCCCCAG GCTCTGAAGG . TCCCC AGGGT GTGCGTGGTG AGCCTGGCCC 

670 680 630 700 710 720 

CCCTGGCCCT GCTGGTGCTG CTGGCCCTGC TGGAAACCCT GGTGCTGATG GACAGCCTGG 

730 740 750 760 770 780 

TGCTAAAGGT GCCAATGGTG CTCCTGGTAT TGCTGGTGCT CCTGGCTTCC CTGGTGCCCG 

790 900 810 820 830 840 

AGGCCCCTCT GGACCCCAGG GCCCCGGCGG CCCTCCTGGT CCCAAGGGTA ACAGCGGTGA 

850 860 870 880 850 900 

ACCTGGTGCT CCTGGCAGCA AAGGAGACAC TGGTGCTAAG GGAGAGCCTG GCCCTGTTGG 

910 920 930 540 950 560 

TGTTCAAGGA CCCCCTGGCC CTGCTGGAGA GGAAGGAAAG CGAGGAGCTC GAGGTGAACC 

970 • 980 990 1000 1010 1020 

CGGACCCACT GGCCTGCCCG GACCCCCTGG CGAGCCTGGT GGACCTGGTA CCCGTGGTTT 

1030 1040 1050 1060 1070 1080 

CCCTGGCGCA GATGGTGTTG CTGGTCCCAA GGGTCCCGCT GGTCAACGTG GTTCTCCTGG 

1090 1100 1110 1120 1130 1140 

CCCCGCTCGC CCCAAAGGAT CTCCTGGTGA AGCTGGTCGT CCCGGTCAAG CTGGTCTGCC 

1150 1160 1170 1180 1190 1200 

TGGTGCCAAC GGTCTCACTC GAAGCCCTGC CACCCCTCGT CCTGATGGCA AAACTCGCCC 

1210 1220 1230 1240 1250 1260 

CCCTOGTCCC CCCGGTCAAG ATGGTCGCCC CGCACCCCCA GGCCCACCTG GTGCCCGTGG 
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1270 1280 1290 1300 . 1310 1320 

TCAGGCTGGT GTGATGCGAT TCCCTCGACC TAAACCTGCT CCTGGACAGC CCGGCAAGGC 

1330 1340 1350 1360 1370 1380 

TGGAGAGCGA GGTCTTCCCG GACCCCCTGG CGCTGTCGCT CCTCCTCOCA AAGATGGAGA 

1390 1400 1410 1420 1430 1440 

GGCTGGAGCT CACCGACCCC CTGGCCCTGC TGCTCCCGCT GGC GAGA GAG CTGAACAAGG 

1450 1460 1470 1480 1490 1500 

CCCTGCTGGC TCCCCCGGAT TCCAGGGTCT CCCTGGTCCT CCTGGTCCTC CAGGTGAAGC 

1510 1520 1530 1540 1550 1560 

AGGCAAACCT GGTGAACAGG GTGTTCCTGG AGACCTTGGC GCCCCTGGCC CCTCTGGAGC 

1570 1580 1590 1600 1610 1620 

AACAGCCGAC AGAGGTTTCC CTGGCGAGCG TGCTGTCCAA GGTCCCCCTG GTCCTGCTGG 

1630 1640 1650 1660 1670 1680 

ACCCCGAGGG GCCAACGGTC CTCCCGGCAA CGATGGTGCT AAGGGTGATG CTGGTGCCCC 

1690 1700 1710 1720 1730 1740 

TGGAGCTCCC GGTAGCCAGG GCGCCCCTGG CCTTCAGGGA ATGCCTGGTG AACGTGGTGC 

1750 1760 1770 1780 1790 1800 

AGCTGGTCTT CCAGGGCCTA AGGGTGACAG AGGTGATGCT GGTCCCAAAG GTGCTGATGG 

1810 1820 1830 1840 1850 1860 

CTCTCCTGGC AAAGATGGCG TCCCTGGTCT GACCGGCCCC ATTGGTCCTC CTGGCCCTGC 

1870 1880 1890 1900 1910 1920 

TGCTGCCCCT GGTGACAAGG GTGAAAGTGG TCCCAGCGCC CCTGCTGGTC CCACTGGAGC 

1930 1940 1950 i960 1970 1980 

TCGTGGTGCC CCCGGAGACC GTGGTGAGCC TGGTCCCCCC GGCCCTGCTC GCTTTGCTGG 

1990 2000 2010 2020 2030 2040 

CCCCCCTGGT GCTGACGGCC AACCTGGTGC TAAACGCGAA CCTGGTGATG CTGGTOCCAA 

2050 2060 2070 2080 2090 2100 

AGGCGATGCT GGTCCCCCTG GGCCTGCCGG ACCCGCTCGA CCCCCTGGCC CCA1TGGTAA 

^ 2120 2130 2140 2150 2160 

TGTTGGTGCT CCTGGAGCCA AAGGTGCTCG CGGCAGCGCT GGTCCCCCTG CTGCTACTGG 

2170 2180 2190 2200 2210 2220 

TTTCCCTGGT GCTGCTGGCC GAGTCGGTCC TCCTGGCCCC TCTGGAAATG CTGGACCCCC 

2230 2240 2250 2260 2270 2280 

TGGCCCTCCT GGTCCTGCTG GCAAAGAAGG CGGCAAAGGT CCCCCTGGTG AGACTGGCCC 

2290 2300 2310 1>320 2330 2340 

TCCTCCACGT CCTGGTGAAG TTGGTCCCCC TGGTCCCCCT GGCCCTGCTC GCGAGAAAGG 

2350 2360 2370 ;».380 2390 2400 

ATCCCCTGGT GCTGATGGTC CTGCTGGTGC TCCTGGTACT CCCGGGCCTC AAGCTATTGC 

2410 2420 2430 , S*440 2450 2460 

TCGACACCCT GGTGTGGTCG GCCTGCCTGC TCACACACGA GACAGAGGCT TCCCTGGTCT 

2470 2480 2490 iSOO 2510 2520 

TCCTGGCCCC TCTGGTGAAC CTGGCAAACA AGGTCCCTCT CGAGCAAG TG CTGAACGTGG 
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2530 2S40 2550" 2560 . 2570 2580 

TCCCCCCCCT CCCATCCGCC CCCCTGGATT GGCTGGACCC CCTGCTGAAT CTGGACGTGA 

2590 2600 2610 2620 2630 2640 

GGGGGCTCCT GCTGCCGAAG CTTCCCCTGC ACCAGACGGT TCTCCTGGCG CCAAGGGTGA 

2650 2660 2670 2680 2690 2700 

CCCTGGTGAG ACCGOCCCCG CTGGACCCCC TGGTGCTCNT GGTGCTCOTG GTGCCCCTGG 

2710 2720 2730 2740 2750 2760 

CCCCCTTGGC CCTGCTGGCA AGAGTGGTGA TCGTGGTGAG ACTGGTCCTG CTGGTCCCGC 

2770 2780 2790 2800 2810 2820 

CGGTCCCCTC GGCCCCGCTC GCCCCCGTGC CCCCGCCGGA CCCCAAGGCC CCCGTGGTGA 

2830 2840 2850 2860 2870 2880 

CAAGGGTGAG ACAGGCGAAC AGGGCGACAG AGGCATAAAG GGTCACCGTG GCTTCTCTGG 

2890 2900 2910 2920 2930 2240 

CCTCCAGGCT CCCCCTGGCC CTCCTGGCTC TCCTGGTGAA CAAGGTCCCT CTGGAGCCTC 

2950 2960 2970 2980 2990 3000 

TGGTCCTGCT GGTCCCCGAG GTCCCCCTGG CICTGCTGGT GCTCCTGGCA AAGATGGACT 

3010 3020 3030 3040 3050 3060 

CAACGGTCTC CCTGGCCCCA TTCGGCCCCC TGGTCCTCGC GGTCGCACTC GTGATGCTGG 

i£Z£ ^080 3090 3100 3U0 3X20 



TCCTGTTGGT CCCCCCGGCC CTCCTGGACC TCCTGGTCCC CCTGGTCCTC CCAGCcSS 

rrrcGA^c agcttc^c cccaccSS tcaagagaag gctcacS gtcgccSa 

CTACCG^ 3200 3210 ^220 3230 3240 

CTACCGGGCT agatctCCAA AGGATCTTCC CCCTCACACA ACTCTGCTAG ACC^C^ 

caacaaS accgaaS aaga^S ctttaaS C^AA^C TTCtaSSl 

*«CxJS? AACAAtS TOIGcJJS TAGTCCTGGA TAAcC 9 «g 0 ??f? 



FIG- 20C 



183 



EP 0 992 586 A2 



malE 



\ 



Polylinker 
cloning site 



lacZ 



EP 0 992 586 A2 



N 
O 

2 



O 

o 
< 

< 

o 
o 

o 
< 
o 

o 
I— 
o 

o 
< 
o 

o 
I- 
o 

< 
o 
< 



o 

J— 

a 
a 

< 
o 
o 

o 



< 
< 

< 

o 



o 
o 
< 

< 
o 
o 

o 
< 
o 

a 
< 



c 



(0 



D 

in 



D 
X 



E 
o 

CD 



O 
O 



CVJ 
CVJ 

o 



c 

E 
x 



185 




186 



EP 0 992 586 A2 



malE 




collagen 



FIG. 24 



187 



EP 0 992 586 A2 




188 



EP 0 992 586 A2 



malE 




FIG. 26 



189 



EP 0 992 586 A2 




GAA GCT GGA AAA CCT GGT CGT OCT GST GAG CGT GG3 CCT CCT GG3 CCT CAG G«* 



Glu Ala Gly Lys Pro Gly Arc; Pro Gly Glu Arg Gly Pro Pro Gly Pro Gin Giy 

279 2ee 297 306 315 324 

CCT CGA GSA TIG CCC GGA ACA GCT GGC CTC CCT GGA ATG A^C- GGA CAC £GA GGT 



Ala Arc Gly Leu Pro Gly Thr Ala Giy Leu Pro Giy Hsc Lys Gly 


His Arc Giy 


333 342 351 


360 365 




378 


TTC AGT GGT TTG GAT GGT GCC AAG G£A GAT GCT 


GGT CCT GCT GGT 


CCT 


AAG GGT 


Phe Ser Giy Leu As? Giy Aid Lys Gly As? Ala 


Giy Pro Ala Giy 




Lys Giy" 


387 396 4C5 


414 423 




432 


GAG CCT GGC AGC CCT GGT GAA AAT GGA GCT CCT 


GGT CAG ATS GGC 




CGT GGC 


Giu Pro Giy Ser Pro Gly Glu Asa Gly Ala Pre 


Giy Gin >2sr Gly 




Arc Giy 


441 450 459 


463 477 




465 


CTG CCT GGT G£G AGA GGT CGC CCT GSA GCC CCT 


GGC CCT GCT GGT 


GCT 


CGT GGA 



Leu Pro Giy Giu Arc Gly Arc Pro Gly Ala Pro Giy Pro Ala Giy Ala Arc Giy 

495 * 504 513 522 531 540 

AAT GAT GGT GCT ACT GGT GCT GCC GGG CCC CCT GGT CCC ACC GGC CCC GCT GGT 



As.-. As? Giy Ala Thr Gly Ala Ala Giy Pro Pre Giy Pro Thr Gly Pro Ala Giy 

549 558 557 ■ 576 liz 594 

CCT CCT GGC TTC CCT GGT GCT GTT GGT GCT AAG GGT GAA GCT GGT CCC CAA GGG 

Pre Pro Gly Phe Pro Gly Ala Val- Gly Ala Lys Giy Glu Ala Giy Pro Gin Giy 

603 612 621 630 639 648 

CCC CGA GGC TCT GAA GGT CCC CAG GGT GTG CGT GGT GAG CCT GGC CCC CCT GGC 

Pro Arg Gly Ser Giu Gly Pro Gin Gly Val Arc Gly Giu Pro Giy Pro Pro Giy 

657 666 675 534 553 702 

CCT GCT GGT GCT GCT GGC CCT GCT G^A A*C CCT GGT GCT G-.T GGA CAG CCT GGT 

?ro Ala Giy Ala rii Giy Pro Ala Gly Asn ?r~ Giy Ala As? Gly Glr. Pro Gly 
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711 720 729 733 7<7 756 

GCT AAA GGT GCC AAT GGT GCT CCT G3T ATT GCT GGT GCT (XT GCC TTC CCT GST 

Aia Lys Gly Ala Asr. Gly Ala Pro Gly lie .-la. Gly Ala Pro Gly ?he Pro Giy 

765 774 783 792 801 810 

GCC CGA GGC CCC TCT GGA CCC CAG GGC CCC GGC GGC CCT OCT GGT CCC AAG GGT 

Ala Arg Gly Pro Ser Gly Pro Gin Gly Pro Gly Gly Pro Pro Gly Pro Lys Giy 

819 62S 837 846 855 864 

AAC AGC GGT GAA CCT GGT GCT CCT GGC AGC AAA GGA GAC ACT GGT GCT AAG GGA 

Asn Ser Gly GIu Pro Gly Ala Pro Gly Ser Lys Gly As? Thr Gly Ala Lys Gly 

873 882 89L ■ 900 $09 918 

GAG CCT GGC CCT GTT GGT GTT C\A GGA CCC CCT GGC CCT GCT GGA GAG GAA GGA 

Glu Pro Gly Pro Vai Gly Val Gin Gly Pro Pro Gly Pro Ala Gly Glu Glu Gly 

927 936 945 954 963 972 

AAG CGA GGA GCT CGA GGT GAA CCC GGA COC ACT GGC CTG CCC GGA CCC CCT GGC 

Lys Arg Gly Ala Arc Gly Glu Pro Gly Pro Thr Gly Leu Pro Gly Pro Pro Gly 

981 990 999 1008 1017 1026 

GAG CGT GGT GGA CCT GGT AGC CGT GGT TTC CCT GGC GGA GAT GGT GTT GCT GGT 

Glu Arg Giy Gly Pro Giy Ser Arg Gly Phs Pro Giy Ala As? Gly Vai Ala Gly 

1035 1044 1053 1062 • 1071 1080 

CCC AAG GGT CCC GCT GGT GAA CGT GGT TCT CCT GGC CCC GCT GGC CCC AAA GGA 

Pro Lys Gly Pro Ala Giy Glu Arg Giy Ser Pro Gly Pro Ala Gly Pro Lys Giy 

i0 69 1098 1107 lug U25 t, 2 * 

TCT CCT GGT GCT GGT OGT GCC GGT GAA GCT GGT CTG OCT GGT GCC A«f GST 

Ser Pro Gly Glu Aia Giy Arg Pro Gly Glu Aia Giy leu Pro Gly All Lys Gly 

HO 1152 H6i U70 1179 has 

fl- ^ ^ ^ ^ ^ ^ 007 ^ C<< ~ A XX GGC CCC CCT GGT 
Leu Thr G1 y Ser Pro Gly Ser Pro Gly Pro Asp Giy Lys Thr Gly Pro Pro Gly 

1197 1206 1215 1224 1233 1242 

Pro Ala Gly Gin As?. Gly Arg Pro Gly Pre Pro Gly Pro Pro Gl^ All Ar^ Giy 

1251 * 1260 1269 1278 1287 120 6 

CAG GCT GGT GIG ATG .GGA TTC CCT GGA CCT AAA GGT GCT GCT GGA GAG COC. GGC 



Gin Ala Gly Vai Mac Gly Phe Pro Gly Pro Lys Gly .Ala All Gly Glu Pro" Giy 

1305 1314 1323 1332 1341 1350 

AAG GCT GGA GAG CGA GGT GTT CCC GGA COC CCT GGC GCT GTC GGT CCT GCT GGC 

Lys Ada Giy Glu Arc Gly Vai Pro Gly Pro Pre Gly Aia Vai Gly Pro All Glv 
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„ C* OCrX «r Ck'S CCC OCT OCT^ CCC «« 

u/s as? ay au aI; ci; hi gL; & y ?«> « y »» *u « y ax* Giy 

1431 1440 1«9 1453 

GAG AGA^GGT G?A CAA GGC CCT GCT GGC TCC CCC GGA TTC CAG GGT CTC CCT GG7 

to Z~ g ay to to ci; m aT* ciy ^ ;» <a y «* Gin «. y i*» pro giv 

1494 1503 1512 

err «S « ^ 

Pro III Gly Pro Pro Gly Glu All Ciy Lys Pro Gly Giu Gin Giy Vai Pro Giy 

152^ 1530 1539 1548 1557 1565 

GijC CT r GGC GCC CCT GGC CCC TCT GGA GCA AGA GGC GAG A£A GGT TTC CCT G^. 

Gly Ala Pro Gly Pro Ser Giy Ala Arg Gly Glu Arg Giy Phe Pro Giy 

1575 1584 1593 1602 1611 1620 

GAG CGT GGT G7G CAA GGT CCC CCT GCT OCT GCT GGA CCC OGA GGG GCC AAC GGT 

Glu Arc Giy Vai Gin Gly Pro Pro Gly Pro Ala Giy Pro Arg Giy Aia Asa Gly 

1629 1638 1647 1656 1665 1674 

GCT CCC GGC AAC GAT GGT GCT AAG GGT GAT GCT GGT GCC OCT GGA GCT CCC GGT 

Ala Pro Giy Asa As? Gly Aia Lys Giy As? Ala Giy Ala Pre Giy Ala Pro Giy 

1633 1692 1701 1710 1719 1728 

AGC CAG GGC GCC CCT GGC CTT CAG GGA AIG CCT GGT GAA CGT GGT GGA GCT GGT 

Ser 31- Gly Ala Pro Gly Leu Gi*i Giy Mac Pro Giy Glu Arg Giy Aia Aia Giy 

1737 1745 1755 1764 1773 1782 

CTT CCA GGG CCT AAG GGT G?JC AGA GGT GAT GCT GGT GCC AAA GGT GCT GAT GGC 

Leu Pro Gly Pro Lys Gly As? Arg Gly As? Ala Giy Pro Lys Giy Aia Asp Gly 

1791 1800 180? 1818 1827 1335 

TCT CCT GGC AAA GAT GGC GTC CGT GGT CTG ACC GGC CCC AXT GGT CCT CCT GGC 

Ser Pro Giy Lys As? Gly Vai Arg Gly Leu Thr Giy Pro lie Giy Prr Pro Giy 

1845 1854 1863 1672 1881 1890 

CCT GCT GGT GCC OCT GG7 GAC AAG GGT GAA AGT GGT CCC AGC GGC CCT GCT GGT 



Pro Aia Gly Aia Pro Giy Asp Lys Gly Glu Ser Gly Pro Ser Giy Pro Aia Gly 

1899 1908 - 1917 1926 1933 1944 

CCC ACT GGA GCT CGT GGT GCC CCC GGA GAC CGT GGT GAG CCT GGT CCC CCC GGC 

Pre Thr Giy Ala Arg Gly Ala Pro Gly As? Arg Gly Glu Pro Giy Pro Pro Gly 

1953 1962 1971 1980 1989 1998 

CCT GCT GGC TTT GCT GGC CCC CCT GGT GCT G-C GGC CAA CCT GGT GCT AAA GGC 



Pro Ala Gly Phe Aia Giy Pro Pro Gly Aia As? Gly Gin Pro Giy Aia Lys Giy 

, 2007 2015 2025 2034 2043 2052 

GAA CCT GGT GAT GCT GGT GCC AAA GGC GAT GCT GGT CCC CCT GGG CCT GCC GGA 

Gi-j Pro Gly As? Ala Gly Ala Lys Gly As? Aia Giy ?rs Pro Giy Pro Ala Giy 
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2061 2070 2079 2088 2097 2106 

CCC GCT GGA CCC CCT GGC CCC ATT GGT AA7 GTT GGT GCT CCT GGA GCC AAA GGT 

Pro Ala Gly IVo Pro Gly Pro He Gly Asn Val Gly Ma Pro Gly Ala Lys Gly 

2ii5 2124 2133 2142 2151 2160 

GCT CGC GGC AGC GCT GGT CCC CCT GGT GCT ACT GGT TTC OCT GGT GCT GCT GGC 

All Zq Gly Ser All Gly Pro Pro ay Ala Thr Gly Phe Pro Gly Ala Ala Gly 

2I69 2178 2187 2196 2205 2214 

GGA GTC GGT CCT CCT GGC CCC TCT GGA AAT GCT GGA CCC CCT GGC OCT CCT GGT 

A^g Val Gly Pro Pro Gly Pro Ser ay Asn Ala Gly Pro Pro Gly Pro Pro Giy 

2223 2232 2241 2250 2259 2263 

CCT GCT GGC AAA GAA GGC GGC AAA GGT CCC CGT GGT GAG ACT GGC CCT GCT GGA 

Pre Ala Gly Lys Glu Gly Giy Lys Gly Pro Arc Giy Glu Thr Giy Pro Ala Gly 

2277 22e6 2255 2304 2313 2322 

CGT CCT GGT GAA GTT GGT CCC CCT GGT CCC CCT GGC CCT GCT GGC GAG AAA GGA 

Arg Pro ay Glu Val Gly Pro Pro Giy Pro Pro Gly Pro Ala Gly Giu Lys Gly 

2331 ' 2340 2349 2358 2367 2376 

TCC CCT GGT GCT GAT GGT CCT GCT GGT GCT CCT GGT ACT CCC GGG OCT CAA GGT 

Ser Pro ay Ala As? Gly Pro Aia ay Ala Pro Giy Thr Pro Gly Pro Gin Giy 

2385 ' 2394 2403 2412 . "2421 2430 

ATT GCT GGA C*G CGT GGT GTG GTC GGC CTG CCT GGT CfcG AGA GGA GAG AGA &GC • 

He Ala Gly Gin Arg Giy Val Val ay Leu Pro Giy Gin Arg Gly Giu Arg Giy 

2439 2448 2457 2466 2475 2484 

TTC CCT GGT CTT CCT GGC CCC TCT GGT G*A CCT GGC AAA CAA GGT CCC TCT GGA 



Phe Pro ay Leu Pro Giy Pro Ser ay Glu Pro Giy Lys Gin Gly Pro Ser Giy 

2493 2502 2511 2520 2529 . 2S38 

GCA AGT GGT GAA CGT GGT CCC CCC GGT CCC ATG GGC COC CCT GGA TTG GCT GGA 



Ala Ser ay Giu Arg Gly Pro Pro ay Pro Mec Gly Pro Pro Gly Leu Ala Giy 

2547 2556 2565 2574 2533 2S92 

CCC CCT GGT GAA TCT GGA CGT GAG GGG GCT CCT GCT GCC GAA GGT TCC CCT GGA 

Pro Pro ay Glu Ser Giy Arg Giu ay Ala Pro Ala Ala Glu Gly Ser Pro Giy 

2601 2610 2619 2628 2 637 2646 

CGA GAC QGT TCT CCT GGC GCC AAG GGT GAC CGT GGT GAG ACC GGC CCC GCT GGA 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly 

2655 2 664 2673 2 682 2691 2700 

CCC CCT GGT GCT CCT GGT GCT CCT GGT GCC CCT GGC CCC GTT GGC CCT GCT GGC 

Pro Pro Gly Ala Pro Gly Ma Pro ay Ala Pro Gly Pro Val Gly Pro Ala Gly 
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2709 2718 2727 2736 27^5 2754 

AAG AGT GGT GAT CGT GG7 G.-G ACT GST CCT GCT GGT CCC GCC CGT CCC C-7C GGC 

Lys Ser Gly As? Arc Giy Giu Chr Giy Pro Ala Gly Pro Ala Gly Pre Val Gly 

2753 2772 2731 27S0 2759 2308 

CCC GCT GGC GCC CGT GGC CCC GCC GGA CCC CAA C-GC CCC CGT GGT GAC ?--<3 GGT 

Fro Ala Gly Ala Arg Giy Pro Ala Giy Pro Gin Gly Pro Arg Giy As? Lys Gly 

2817 2825 2835 2844 2853 2862 

GAG ACA GGC GAA CAG GGC GAC AGA GGC ATA AAG GGT CAC CGT GGC TTC TCT GGC 

Giu Thr Gly Glu Gin Giy As? Arg Gly lie Lys Giy His Arg Giy Phe Ser Giy 

2871 2380 2889 2898 2907 2916 

CTC CL-G GGT CCC CCT GGC CCT CCT GGC TCT CCT GGT GAA CAA GGT CCC TCT GGA 

Leu Gin Gly Pro Pro Giy Pro Pro Giy Ser Pro Gly Glu Gin Giy Pre Ser Giy 

2925 2934 * 2943 2952 2951 2970 

GCC TCT GG7 CCT CCT GGT CCC CGA GGT CCC CCT GGC TCT GCT GGT GCT CCT GGC 

Ala Ser Giy Pro Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Giy Ala Pro Gly 

2975 29S8 2997 3006 3015 3024 

AAA GAT GGA CTC AAC GGT CTC CCT GGC CCC ATT GGG CCC CCT GGT CCT GGC GGT 

Lys As? Gly Leu Asn Giy Leu Pre Giy Pre lie Gly Pro Pro Giy Pro Arg Giy 

3033 3042 30=1 3060 3069 3078 

CGC ACT GGT GAT GCT GGT CCT GTT GGT. CCC CCC GGC CCT CCT GGA CCT CCT GGT 

Arc Thr Giy As? Ala Gly Pro Vai Giy Pro Pro Gly Pro Pro Gly Pro Pro Giy 

3087 3096 3105 3114 3123 3132 

CCC CCT GGT CCT CCC AGC GCT GGT TTC GAC TTC AGC TTC CTC CCC C-jG CCA CCT 

Pro Pro Giy Pre Pro Ser Ala Giy Phe Asp Phe Ser Phe Leu Pro Gin Pro Pro 

3141 3150 31:9 3163 

CAA GAG AAG GCT C-JC GAT GGT GGC CGC TAC TAC CGG GCT 3' 

Glr. Glu Lys Ala His As? Gly Giy Arc Tyr Tyr Arg Ala 



FIG. 27E 



194 



EP 0 992 586 A2 




EP 0 992 586 A2 




196 



EP 0 992 586 A2 





MCol 


ColECol 


Proline 






ecu 


139 


11 


CCC 


93 


12 


CCA 


6 


27 


CCG 


0 


189 


Glycine 






GGU 


174 


147 


GGC 


97 


179 


GGA 


64 


8 


GGG 


11 


12 



FIG. 30 



197 



EP 0 992 586 A2 



[Hyp], mM 




FIG. 31 



198 



EP 0 992 586 A2 



[NaCI], mM 



o 
oo 



E E 




- ?w 



Il ll • 



FIG. 32 



199 



EP 0 992 586 A2 




200 



EP 0 992 586 A2 



CO 



Temperature 



o o o o o o 

o o m o m c 

v- C>4 CM K) K) ^ 




FIG. 34 



201 



EP 0 992 586 A2 




FIG. 35 



202 



EP 0 992 586 A2 




FIG. 36 



203 



EP 0 992 586 A2 




FIG. 37 



204 



EP 0 992 586 A2 




FIG. 38 



205 



EP 0 992 586 A2 



18 21 36 45 54 

TAT GAT GAA AAA AGC ACC GGC GGC ATC AGC GTG CCG GGC 



CAG CTG AGC TAT GGC 
Gin Leu Ser Tyr Gly Tyr As? Glu Lys Ser Thr Gly Gly lie Ser Val Pro Gly 



63 



72 81 90 99 108 

GGC CCG CGT GGC CTG CCG GGC CCG CCA GGT GCG CCC GGT 



CCG ATG GGT CCG AGC 
Pro Kec Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro Gly Ala Pro Gly 



117 

CCG CAG GGC TTT CAG 



126 135 144 153 162 

GGT CCG COG GGC GAA CCG GGC GAA OCT GGT GCG AGC GGC 



Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly Glu Pro Gly Ala Ser Gly 



171 

CCG ATG GGC CCG CGC 



180 189 193 207 216 

GGC CCG COG GGT CCG CCA GGC AAA AAC GGC GAT GAT GGC 



Pro Mec Gly Pro Arg Gly Pro Pro Gly Pro Pro Gly Lys Asa Gly As? Asp Gly 



225 

GAA GCG GGC AAA CCG 



234 243 252 261 270 

GGA CGT CCG GGT GAA CGT GGC CCC CCG GGC COG CAG GGC 



Glu Ala Gly Lys Pro Gly Arg Pro Gly Glu Arg Gly Pro Pro Gly Pro Gin Gly 



279 , 

GOG CGC GGA CTG CCG 



288 297 305 315 324 

GGT ACT GOG GGA CTG CCG GGC ATG AAA GGC CAC CGC GGT 



Ala Arg Gly Leu Pro Gly Thr Ala Gly Leu Pro Gly Met Lys Gly His Arg Gly 



333 

TTC TCT GGT CTG GAT 



342 351 360 369 378 

C-3T GCC AAA GGA GAC GCG GG? CCG GCG GGT CCG AAA GGT 



Phe Ser Gly Leu Asp Gly Ala Lys Gly As? Ala Gly Pro Ala Gly Pro Lys Gly 



387 

GAG CCG GGC AGC CCG 



395 405 414 423 432 

GGC GAA AAC GGC GCG CCG GG7 CAG ATG GGC CCG OGT GGC 



Glu Pro Gly Ser Pro Gly Glu Asn Gly Ala Pro Gly Gin tec Gly Pro Arg Gly 



441 

CTG OCT GG7 GAA CGC 



450 459 463 477 486 

GGT CGC COG GGC GOC CCG GGC CCA GCT GGC GCA CGT GGC 



Leu Pro Gly Glu Arg Gly Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala Arg Gly 



495 

AAC GAT GGT GCG ACC 



504 513 522 531 540 

GGT GCG GOC GGT OCA CCG GGC CCG ACG GGC CCG GOG GGT 



Asn Asp Gly Ala Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr Gly Pro Ala Gly 



549 

CCC CQ3 GGC TTT CCG 



558 567 575 585 594 

GGT GCG GTG GGT GCG AAA GGC GAA GCA GGT CCG CAG GGG 



Pro Pro Gly Phe Pro Gly Ala Val Gly Ala Lys Gly Glu Ala Gly Pro Gin Gly 



603 

CCG CGC GGG AGC GAG 



612 621 630 639 648 

GGT CCT CAG GGC GTT CGT GG? GAA OCG GGC CCG CCG GGC 



Pro Arg Gly Ser Glu Gly Pro Gin Gly Val Arg Gly Glu Pro Gly Pro Pro Gly 



657 

CCG GCG GGT GOG GCG 



666 675 684 693 702 

GGC CCG GCT GGT AAC CCT GGC GOG GAC GGT CAG OCA GGT 



Pro Ala Gly Ala Ala Gly Pro Ala Gly Asn Pro Gly Ala Asp Gly Gin Pro Gly 
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„, T20 • 123 141 156 

(«C AAA GST GCC AAC «C CCS CCG GGT AT? OA GST CCA CS «K T7C CCC OCT 

AU Lys tty Ala C-Iy Ail ?ro Gly Xta Ala Giy Ala Pro Gly M» ?ro Gly 

-t 4 -fli 792 801 910 

GCC CGC GGC CCG TCC GGC CCG CAG GGC COG GGC GGC ^CG CCC GGC CCG AAA GGG 

Ma" cly Pro Ser «y Pro cln M Gly Giy Pro Pro Gly ?ro Lys Gly 

fil q 828 637 846 855 864 

AAC AGC GGT GAA CCG GGT GCG CCG GGC AGC AAA GGC GAC ACC OCT COG AAA GGT 

Zl Ser Gly Clu P» Gly Ala ?ro Gly Ser Lys Gly As? r>_r Gly .Ma Lys Gly 

373 £32 6S1 900 909 918 

GAA CCG GGC CCA GTG GGT GTT CAA GGC CCG CCG GGC CCG GCG GGC GAG GAA GGC 

Glu Pro Giy Pro Val Gly Vai Gin Gly Pre Pro Gly Pro Ala Giy Glu Glu Gly 

927 936 945 954 963 972 

AAA CGC GGT GCT CGC GGT GAA CCG GGC CCG ACC GGC CTG CCT GGC CCG CCG GGA 

Lys Arg Gly Ala Arg Giy Glu Pro Gly Pre Thr Giy Leu Pro Gly Pro Pre Gly 

cel.- 950 599 1003 1017 1026 

GAA CGT GGT GGC CCG GGT AGC CGC GGT TT7 CCG GGC GCG GAT GGT GIG GCG GGC 

Glu Arg Giy Giy Pro 3iy Ser Arg Giy Phe Pro Giy ,Ala Asp 'Gly Val Ala Giy 

1Q35 1053 10S2 1071 1080 

CCG AAA GGT CCG GCG GGT GAA CGT GGT AGC CCG GGC CCG GCG GGC CCA AAA GGC 

Pro Lys Gly Pro Ala Giy Gin Arg Gly Ser Pro Giy Pro Ala Gly Pro Lys Gly 

1089 1C98 U07 1115 1125 1134 

AGC CCG GGC GAG GCA GGA OG? CCG GGT GAA GCG GGT CTC CCG GGC GCC AAA GGT 

Ser Pro Gly Glu Ala C-iy Arr Pro Gly Glu Ala Giy Leu Pro Gly Ala Lys Gly 

1143 1152 1161 1170 1179 1188 

CTG ACC GGC TCT CCG GGC AGC CCG GGT CCG GAT GGC AAA ACG GGC CCG CCC GGT 

Leu Thr Giy Ser Pro 31y Ser Pro Gly Pro Asp Gly Lys Thr Gly Pro Pro Gly • 

1197 1206 1215 1224 1233 1242 

CCGGCCGGCC^GWOTCGCCCGC^CCSCCGG 

Pro Ala Gly Gin Asp Gly Arg Pro Gly Pro Pro Giy Pro Pro Gly Ala Arg Gly 

. 1251 1250 . 1269 1273 1287 1296 

CAG GCG GGT Q7Z ATG GGC TTT CCA GGC CCC AAA GGT GCG GCG GGT GAA CCG GGC 

Gin Ala Gly Val Met Gly Phe Pro Gly Pro Lys Giy Ala Ala Gly Glu Pro Gly 

1305 1314 1323 1332 1341 1350 

AAA GCG GC-C GAA CGC GGT GTC CCG GGT CCG CCG GGC GCT GTC GGG CCG GCG GGC 

Lys Ala Gly Glu Arg Giy Val Pro Gly Pro Pro Gly Ala val Gly Pro Ala Gly 

1359 1368 1377 1355 -1395 1404 

AAA GAT GGC CL-A GCC- GGC GCG CAA GGC CCG CCG GGA CCA GCG GGT CCG GCG GGC 

Lys Asp Gly Glu Ala Gly Ala Gin Gly Pro Pro Giy Pro Ala Gly Pro Ala Gly 
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1422 1431 14<0 K49 14 ->3 

GAG CCC OGT GA* CM « CCG CO GGC AGC CCS CGT TTC CAG C-Gt C7C CCS GGC 

Gil Z' q cly Gil On £ 5i te ^ Gly Cl« Gly ^ Gly 

uat 1494 1503 1512 

OCT GCg'gS CCA CCg'St GAA GCg\sC ^ CCG GGG GAA CM GGT GTG CCG GGC 
;" t ; £ ^ lly «« Ma ay lys ?« Gly Glu Gin Gly V*l Pro Gly 

,.- rt 1539 1548 1557 1566 

gacc^Sg^ 

Gly Ma Pro cly Pro Ser ttj aII Gly Glu Arg Gly Phe Pro Gly 

,c 15 1584 1593 1602 1611 1620 

GAA OGT GGT GTG CAG GGC CCG OCC GGC COG GC? GGT CCG CGC GGC GCC AAC GGC 

Glu Ar"g Gly vll Gin Gly Pro Pro Gly Pro Ala Gly Pro Arg Gly Ala Asn Gly 

1629 1638 1547 1656 1665 1674 

GCG CCG GGC AAC GAT GGT GOG AAA GGT GAT GCG GGT GCC OCA GGT GCG CCG GGC 

Ala Pro Gly Asa Asp Gly Ala Lys Gly Asp Ala Gly AU Pro Gly Ala Pro Gly 

1683 ' 1692 1701 1710 1719 1723 

AGC CAG GGC GOC CCG GGG CTG CAA GGC ATG CCG GGT GAA CGT GGT GCC GCG GGT 

Ser Gin Gly Ala Pro Gly Leu Gir. Gly .Met Pre Gly. Glu Arg Gly Ala Ala Gly 

1737 1746 1755 1764 1773 1782 

C?A CCG GGT CCG AAA GGC GAC CGC GGT GAT GCC- GGT CCA AAA GGT GCG GAT GGC 

Leu Pro Gly Pro Lys Gly As? Arg Gly Asp Als Gly Pro Lys Gly Ala As? Gly 

1791 1800 1809 1818 1827 1836 

TCC OCT GGC AAA GAT GGC GTT CGT GGT CTG ACT GGC CCS AXC GGC OCG GCG GGC 

Ser Pro Gly Lys Asp Gly Val Arg Gly Leu Gly Pro lie Gly Pro Pro Gly 

1845 1854 . 1853 1872 1881 1890 

CCG GCA GGT GOC COG GGT GAC AAA GGT GAA AGC GGT CCG AGC GGC CCA GCG GGC 



Pro Ala Gly Ala Pro Gly As? Lys Gly Glu Ser Gly Pro Ser Gly Pro Ala Gly 

1899 1908 1917 1926 1935 1944 

CCC ACT GGT GOG CGT GGT GCC CCG GGC GAC CGT GGT GAA OCG GGT OCG CCG GGC 



Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp Arg Gly Giu Pro Gly Pro Pro Gly 

1953 1962 1971 1980 1989 1998 

CCG GCG GGC TTT GOG GGC CCG CCA GGC GCT GAC GGC CAG CCG GGT GCG AAA GGC 



Pro Ala Gly Phe Ala Gly Pro Pro Gly Ala Asp Gly Gin Pro Gly Ala Lys Gly 

2007 2016 2025 2034 2043 20S2 

GAA CCG GGG GAT GCG GGT GCC AAA GGC GAC GCG GGT CCG OCG GGC CCT GCC GGC 

Glu Pro Gly Asp Ala Gly Ala Lys Gly Asp AU Gly Pro Pro Gly Pro Ala Gly 

2061 2070 2079 2088 2097 2106 

CCG GOG GGC COG CCA GGC CCG ATT GGC AAC GTG GGT GCG CCG GGT GCC AAA GGT 

Pro Ala Gly Pro Pro Gly Pro lie Gly Asn Val Gly Ala Pro Gly Ala Lys Gly 
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,.,t 212*; 2U3 2151 2160 

GCG CSC « AGC GCT GGT CCG CC3 GGC GCG ACC GGT TTC CCC GGT GCG GCG GGG 

'fill ^9 CIV Ser ^a Gly Pro 7:1 Gly Ale Thr Gly ?te Pro Giy Ala Ala Gly 

oigo 2178 21B7 2196 22C5 2214 

CGC GTG OCT COG CCA GGC COG AGC GGT ttC GCG GGC CCG OCG 3SC CCG CCG GGC 

^rg Val Gly Pro Pro Gly Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro Pro Gly 

222 3 2232 2241 22S0 2259 2268 

CCG GCG GGC AAA GAG GGC GGC AAA GGT CCG CGT GGT GAA PCC GGC CCT GCG GGA 

Pro Ala Gly Lys Giu Gly Gly Lys Gly ?ro Arc Gly Giu Thr Gly Pro Ala Gly 

2277 2285 2295 2304 2313 2322 

CGT CCA GGT GAA GTG GGT CCG CCG GGC COG CCG GGC CCG GCG GGC GAA AM GGT 

Arg Pro Gly Giu Val Gly Pro Pro Gly Pro Pre Gly Pro Ala Gly Git: Lys Gly 

2331 2340 2349 2358 2367 2376 

AGC CCG GGT GOG GAT GGT OOC GCC GGT SOG CCA GGC ACG COG GGT OCG CAA GGT 

Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala Prs Gly Thr Pro Gly Pro Gin Gly 

2395 2394 2403 2412 2421 2430 

ATC GCT GGC CAG CG? GGT GTC GTC GGG CTG CCG GGT CAG CGC GGC GAA CGC GGC 

He Ala Gly Gin Arg Gly Val Vai Gly Leu Pre Gly Gin Arg Gly Giu Arg Gly 

2439 244= 2457 2466 2-i75 2434 

TTT CCG GGT CTG CCG GGC CCG AGC GGT GAG CCG GGC AAA CAG GGT OCA TCT GGC 

Phe Pro Gly Leu Pro Gly Pro Ser Gly Giu Pro Gly Lys Gin Gly Pre Ser Gly 

2493 2502 2511 2520 2S29 2538 

GCG AGC GGT 6AA CGT GGC CCG CCG GGT CCC ATG GGC CCG OCG GGT CTG GOG GGC 

Ala Ser Gly Giu Arg Gly Pro Fro Gly Pro Me. Gly Pro Pro Gly Leu Ala Gly 

2547 25:5 2565 2574 2583 2592 

CCT CCG GGT GAA AGC GGT CGT GAA GGC GCG CCG GGT GCC GAA GGC AGC CCA GGC 

Pro Pro Gly Giu Ser Gly Arg Giu Gly Ala Pro Gly Ala Giu Gly Ser Pro Gly 

2601 2610 2 619 2628 2637 2646 

CGC GAC GGT AGC OCG GGG GCC AAA GGG GAT CGT GGT GAA ACC GGC OCG GCG GGC 

Arg Asp Gly Ser Pro Gly Ala Lys Gly Asp Arg Gly Giu Thr Gly Pro Ala Gly 

•2655 2654 2673 2 682 2691 2700 

CCC CCG GGT GCA COG. GGC GOG CCG GGT GCC CCA GGC CCG GTG GGC CCG GCG GGC 

Pro Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Vei Gly Pro Ala Gly 

2709 2716 2727 2736 2745. 2754 

AAA AGC GGT GAT CGT GGT GAG ACC GGT COG GCG GGC CCG GCC GGT CCG GTG GGC 

Lys Ser Gly Asp Arg Gly Giu Tlir Gly Pro Ala Gly ?ro Ale Gly Pro Val Gly 

2763 27.72 2781 2790 2799 2808 

CCA GCG GGC GCC CGT GGC CCG GCC GGT COG CAG GGC CCG CGG GGT GAC AAA GGT 

Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro Gin Gly Pro Arg Gly As? Lys Gly 
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2817 2826 7835 2344 2853 2862 

GAA ACG GGC GAA CAG GGC GAC CGT GGC ATT AAA GGC CAC CGT GGC TTC AGC GGC 

Glu Thr Gly GXu Gin Gly Asp Arg Gly He Lys Gly His Arg Gly Phe Ser Gly 

2871 2880 2889 2898 2907 2916 

CTG CA£ GGT CCA CCG GGC CCG CCG GGC AGT CCG GST GAA CAG GGT CCG TCC GGA 

Leu Gin Gly Pro Pro Gly Pro Pro Gly Ser Pro Gly Glu Gin Gly Pro Ser Gly 

2925 2934 2943 2552 2961 2970 

GCC AGC GGG COG GCG GGC OCA CGC GGT CCG CCG GGC AGC GOG GGT GCG CCG GGC 

Ala Ser Gly Pro Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala Pro Gly 

2979 2988 2997 3006 3015 3024 

AAA G-.C GGT CTG AAC GGT CTG CCG GGC COG ATC GGC CCG CCG GGC CCA CGC GGC 

Lys As? Gly Leu Asn Gly leu Pro Gly Pro He Gly Pro Pro Gly Pro A^g Gly 

3033 3042 3051 3060 3069 3078 

CGC ACC GGT GAT GCG GGT COG GTG GGT OCC CCG GGC COG OCG GGC CCG CCA GGC 

Arg Thr Gly Asp Ala Gly Pro Val Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly 

3087 3096 3105 3114 3123 5n5 

^ « *l « CCG AGC GCG GGT TTC C?C TTC .AGC TTC CTG CCG CAG OCG COG 

Pro Pro Gly Pro Pro Ser Ala Gly p"^ ^ Vhl Phe teu Pro" Gl"n P~ro" Pro 

3141 3150 3159 ' 3" afl 

CAG GAG AAA CCG CAC GAC GGC GGT CGC TAC TAG CGT GCG 3 ' 

Gin Glu Lys Ala His Asp Gly Gly Z^'jfr Tyr ^ 
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114 base pair fragment of human 
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9 10 27 36 -IS 54 

5- G-G CTG AGC TAT GGC TAT GAT GAA AAA AGC ACC GGC GCC ATC ATC GTS COG GGC 

Gin Leu Ser Tyr Gly Tyr Asp Glu Lys Ser Thr Gly Gly lie Ser Val Pro Sly 

S3 72 81 90 ' 99 108 

CCS ATG GGT CCG AGC GGC OCG CGT GGC CTG OCG GGC CCG CC*\ C-TT GCG CCC GST 

Pro Met Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro Gly Ala Pro Gly 



CCG 
Pro 
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242 base pair fragment of human 
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9 10 27 36 45 51 

y CAG CTG AGC TAT GGC TAT GAT GAA AAA AGC ACC GGC GGC ATC AGC CTG CCG GGC 

Gin Leu Ser Tyr dy Tyr Asp Glu Lys Ser Thr Gly Gly He Ser Val Pro Gly 

63 72 81 90 * 99 108 

CCG A?G GGT CCG AGC GGC CCG CGT GGC CTG OCG GGC CCG CCA GG7 GCG CCC GGT 

Pro «at Gly Pro Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro Gly Ala Pro Gly 

117 126 135 * 144 1S3 162 

CCG CAG GGC TTT CAG GGT CCG CCG GGC GAA 003 GGC GAA CC7 GGT GCG AGC GGC 



Pro Gin Gly Phe Gin Gly Pro Pro Gly Glu Pro Gly Glu Pro Gly Ala Ser Gly 

' 171 . 180 189 198 207 216 

CCG ATG GGC CCG OGC GGC CCG CCG GGT OCG CCA GGC AAA AAC GGC GAT G^T GGC 

Pro tec Gly Pro Arg Gly Pro Pro Gly Pro Pro Gly Lys Asr. Gly Is? As? Gly 

225 234 
GAA GCG GGC AAA CCG GGA CGT OCG 

Glu Ala Gly Lys Pro Gly Arg Pro 
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234 Nucleotide Human collagen Type I 
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360 Nucleotide Human collagen Type I 
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Nucleotide Human collagen Type I 
( aj 3' Fragment with Optimized 
Codon Usage 
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<) 10-21 3G 45 51 

5' C7T> TAT GAT GC-A MA GG\ GTT GGA CTT GGC OCT GG\ CCA ATG GGC TTA ATG GGA 

Gin Tyr Asp Gly Lys Gly Val Gly Leu Gly Pro Gly Pro Mac Gly Leu Met Gly 

i 

63 72 81 90 99 108 

CCT AGA GGC CCA OCT GGT OCA GCT GGA GX OCA GGC OCT CAA GGT TTC CM GGA 

Pro Arg Gly Pro Pro Gly Ala Ala Gly Ala Pro Gly Pro GU\ Gly Phe Gin Gly 

117 126 135 144 153 162 

OCT GCT GGT G-G OCT GGT GAA CCT GGT CAA ACT GGT CCT GCA GGT GCT CGT GGT 

Pro Ala Gly Glu Pro Gly Glu Pro Gly Gin Thr Gly Pro Ala Gly Ala Arg Gly 

VU 180 189 198 207 216 

CCA GCT GGC OCT OCT GGC AAG GCT GGT GAA GAT GGT CAC CCT GGA AAA CCC GGA 

Pro Ala Gly Pro Pro Gly Lys Ala Gly Glu Asp Gly Kis Pro Gly Lys Pro Gly 

225 234 20 252 261 270 

CGA CCT GGT GAG AGA GGA GTT GTT G1A OCA CAG GGT GCT CGT GGT TTC OCT GGA 

Arg Pro Gly Glu Arg Gly Val Val Gly Pro Gin Gly Ala ^ Phe to Gly 

279 266 297 306 315 m 

^1 2^ ^* ®^ ^ T < 3S* GAT GGA 

Thr Pro Gly leu Pro Gly Phe Lys cly He ^ Gl^ to Gly Zu^^ly 

U» Lys Gly Gin * ro Zl P ~ ro ^y .Val Lys" Gly g!u~ to ^ p~ 
^^^^ 

Glu Asn Gly Thr Pro Gly Gin ^ Gly All Arc" Gly *u Sy*^^^ 
Arg Val Gly Ala Pro Gly Pro Ala Gly Ala" Arg" Gly Ser a* 01^^^ 
Pro va ay Pro Ala Gly Pro He Gly Ser Ala Gly to to Gly P ~ ^ ■ 
Ala Pro Gly Pro Lys Gly Glu He" Gly Ala" Val" Gly Asn Al'a' Gly Pro Ala Gly 

603 612 621 cm 

Pro Ale Gly Pro Arg «y Glu Val Gly" ^ Pro Gly Uu" ^ Gly P« Val Gly 

657 $65 gye 

Pro Pro Gly *. Pw GIy ^ £ £ ^ ^ ~ ~ ~ ~ ~ 
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73fl 747 756 

^ ^ ^ ^ Pt ; * y «» *«> *** «y pro a » 

,« 792 801 610 

^V^^Ma^ay^^G^AlaArgGly^uValGlyaaPxoGly 
ooa fi37 846 855 864 

M Ma cly ier Lys Gly gIu » 5 Asn Lys Gly Glu Pro Gly Ser AU Gly 
... sg2 89i 900 909 .918 

Pro Gin" Gly Pro to Gl^ Ser Gly Glu Glu Gly Lys Arg Gly Pro Asn Gly 

927 936 945 954 963 972 

GAA GCT GGA TCT G0C GX OCT OCA GGA OCT OCT GGG CTG AGA GGT AGT OCT GGT 

Glu Ala;Gly Ser Ala Gly Pro Pro Gly Pro Pro Gly Leu Arg Gly Ser Pro Gly 

981 990 999 1008 1017 1026 

TCT CGT GGT CTT OCT GGA GCT GAT G3C AGA GCT GX GIC ATG GGC OCT CCT GGT 

Ser Arg Gly Leu Pro Gly Ala As? Gly Arg Ala Gly Val Met Gly Pro Pro Gly 

1035 1044 1053 1062 1071 1080 

ACT CGT GGT GCA AGT GGC CCT GCT GGA GTC CGA GGA CCT AAT <XL\ GAT GCT GGT 

Ser Arg Gly Ala Ser Gly Pro Ala Gly Val Arg Gly Pro Asn Gly Asp Ala Gly 

1089 1098 1107 1116 1125 1134 

OGC OCT GGG GAG OCT GGT CIC ATG GGA 0CC ASA GGT CTT OCT GGT TCC CCT GGA 



Arg Pro Gly Glu Pro Gly Leu Met Gly Pro Arg Gly Leu Pro Gly, Ser Pro Gly 

1143 1152 1161 1170 1179 ' 1188 

AAT ATC GGC OGC GCT GG\ AAA GAA GGT OCT GTC GGC CIC OCT GGC ATC GAC GGC 

Asn lie Gly Pro Ala Gly Lys Glu Gly Pro Val Gly Leu Pro Gly He Asp Gly 

1197 1206 1215 1224 1233 1242 

AGG OCT GGC CCA ATT GX CCA GCT GGA GCA AGA GGA GAG OCT GGC AAC ATT GGA 

Arg Pro Gly Pro lie Gly Pro Ala Gly Ala Arg Gly Glu Pro Gly Asa lie Gly 

1251 . 1260 1269 1278 1287 1296 

TTC OCT GGA OCC AAA GGC OX ACT GGT GAT OCT GGC AAA AAC GGT GAT AAA GGT 

Phe Pro Gly Pro lys Gly Pro Thr Gly Asp Pro Gly Lys Asn Gly Asp Lys Gly 

1305 1314 1323 1332 1341 1350 

CAT GCT GGT CTT OCT GGT GCT OGG GGT OCT CCA GGT CCT GAT GGA AAC AAT GGT 



His Ala Gly Leu Ala Gly Ala Arg Gly Ala Pro Gly Pro Asp Gly Asn Asn Gly 

1359 1368 1377 1386 1395 1404 

GCT CAG GGA CCT CCT GGA CCA CAG GGT GTT CAA GGT GGA AAA GGT GAA CAG GGT 
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^ Ota Gly Pro Pro Gly Pro On Ay Val On Gly Gly Lys Gly Glo On Gly 

i4ti 1440 1449 K58 
14" U 3 rir rr-r CIG OCT GGC COC TCA GGT CCC GCT GOT 
COC GAT GGT CCT CCA GQC TTC CAG GGT CTC OCT wu~ ^**» 

p"^ Zl Gly l7o 7Jo P^ Gin Gly leu Pro Gly.Pro Ser Gly Pro Ala Gly 

lift* 1494 1503 1512 

GAA GTT^GGC AAA ^^^^ AGG GGT CTC CAT GGT GAG TTT GGT CTC CCT GGT 

to vll W £ P* Arg Gly I*u His Gly Glu Phe Gly leu Pro Gly 

1530 1539 1548 1557 1566 

CCT GCT GGT OCA AGA GGG GAA CGC GGT CCC CCA GGT GAG AGT GGT GCT GOC GGT 

?7o cly Pro Z'g IVy Glu Arg Gli Pro Pro Gly Glu Ser Gly Ala Ala Gly 

1S75 1584 1593 1602 1611 1620 

OCT ACT GGT OCT ATT GGA AGC OGA GCT OCT TCT GGA COC CCA GGG OCT GAT GGA 

p r "o ^ Gly Pro lie Gly Ser Arg Gly Pro Ser Gly Pro Pro Gly Pro Asp Gly 

1629 1638 1647 1656 . 1665 1674 

AAC AAG GGT GAA OCT GGT GTG GTT GGT GCT GTG GGC ACT GCT GGT CCA TCT GGT 

Asn Lys Gly Glu Pro Gly Val Val Gly Ala Val Gly Thr Ala Gly Pro Ser Gly 

1683 1692 1701 1710 1719 1728 

CCT AGT GGA. CTC OCA GOV GAG AGG GGT GCT GCT GGC ATA OCT GGA GGC AAG GGA 



Pro Ser Gly Leu Pro Gly Glu Arg Gly Ala Ala Gly lie Pro Gly Gly Lys Gly 

i737 1746' 1755 1764 . 1773. 1782 

GAA AAG GST GAA OCT GGT CTC AGA GGT GU ATT GGT AAC OCT GGC AGA GAT GGT 

Glu Lys Gly Glu Pro Gly Leu Arg Gly Glu He Gly Asa Pro Gly Arg Asp Gly 

1791 1800 1809 1818 1827 1836 

GCT CGT GGT GCT CAT GGT GCT GTA GGT GOC OCT GGT CCT GCT GGA GCC ACA GGT 

Ala Arg Gly Ala His Gly Ala Val Gly Ala Pro Gly Pro Ala Gly Ala Thr Gly 

1845 1854 1863 1872 1881 1890 

GAC OGG GGC GAA GCT GGG GCT GCT GGT OCT GCT GGT CCT GCT GGT OCT OGG GGA 

Asp Arg Gly Glu Ala Gly Ala Ala Gly Pro Ala Gly Pro Ala Gly Pro Arg Gly 

1899 1908 1917 1926 1935 1944 

AGC OCT GGT GAA CGT GGC GAG GXC GGT OCT GCT GGC COC AAC GGA TTT GCT GGT 



Ser pro Gly Glu Arg Gly Glu Val Gly Pro Ala Gly Pro Asa Gly Phe Ala Gly 

1953* 1962 1971 1980 1989 1998 

COG GCT GGT GCT GCT GGT CAA 00G GGT OCT AAA GGA GAA AGA GGA GCC AAA GGG 



Pro Ala Gly Ala Ala Gly Gin Pro Gly Ala lys Gly Glu Arg Gly Ala Lys Gly 

2007 2016 2025 2034 2043 2052 

OCT AAG GGT GAA AAC GGT GTT GTT GGT CCC ACA GGC CCC GIT GGA GCT GCT GGC 



Pro Lys Gly Glu Asn Gly Val Val Gly Pro Thr Gly Pro Val Gly Ala Ala Gly 

2061 2070 2079 2088 2097 2106 

CCN NNK GGT CCA AAT GGT CCC CCC GGT CCT GCT GCA AGT CGT GGT GAT GGA GGC 
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Pro Xxx Gly Pro Asn Gly Pro Pro Gly Pro Ala Gly Ser Arg Gly As? Gly Gly 

211S 2124 2133, 2142 2151 2160 

COC OCT GGT ATG ^CT GGT TTC CCT GGT OCT OCT GG\ CGG ACT GGT CCC CCA GGA 

Pro Pro Gly Met Thr Gly Phe Pro Gly Ala Ala Gly Arg Thr Gly Pro Pro Gly 

2169 2178 2187 2196 2205 2214 

COC TCT GGT ATT TCT GGC CCT CCT GGT COC CCT GGT CCT GCT GGG AAA GAA GGG 

Pro Ser Gly lie Ser Gly Pro Pro Gly Pro Pro Gly Pro Ala Gly Lys Glu Gly 

2223 2232 2241 2250 2259 2268 

CTT CGT GGA CCN CGA CGI GAC CAA GGA OCA GCA GGC CGA CCT GGA GAA GTA GGA 

Leu Arg Gly Pro Arg Gly Asp Gin Gly Pro Ala Gly Arg Pro Gly Glu Val Gly 

2277 2286 2295 2304 2313 2322 

GCA CCG GGT CCC OCT GGC TTC GCT GGT GAG AAG GGT CCC TCT GGA GAG GCT GGT 

Ala Pro Gly Pro Pro Gly Phe Ala Gly Glu lys Gly Pro Ser Gly Glu Ala Gly 

2331 2340 2349 2358 2367 2376 

ACT GCT GGA CCT OCT GGC ACT OCA GGT OCT CAG GGT CTT CTT GGT GCT CCT GGT 

Thr Ala Gly Pro Pro Gly Thr Pro Gly Pro Gin Gly Leu Leu Gly Ala Pro Gly 

2335 2394 2403 2412 2421 2430 

ATT CTG GGT CTC CCT GGC TCG AGA GGT GAA CGT GGT CTA CCT GGT GTT GCT GGT 

lie Leu Gly Leu Pro Gly Ser Arg Gly Glu Arg Gly Leu Pro Gly Val Ala Gly 

2439 2448 2457 2466 2475 2484 

GCT GTG GGT GAA CCT GGT CCT CTT GGC ATT CCC GGC CCT CCT GGG GCC CGT GGT 

Ala Val Gly Glu Pro Gly Pro Leu Gly lie Ala Gly Pro Pro Gly Ala Arg Gly 

2493 2502 2511 2520 2529 2538 

CCT OCT GGT GCT GTG GGT AGT CCT GGA GXC AAC GGT GCT CCT GGT GAA GCT GGT 

Pro Pro Gly Ala Val Gly Ser Pro Gly Val Asn Gly Ala Pro Gly Glu Ala Gly 

2547 2556 2565 2574 2583 2592 

OCT GAT GGC AAC OCT GGG AAC GAT GGT CCC OCA GGT CGC GAT GGT CAA COC GGA 

Arg Asp Gly Asn Pro Gly Asn Asp Gly Pro Pro Gly Arg Asp Gly Gin Pro Gly 

2601 2610 2619 2628 2637 2646 

CA£ AAG GGA GAG CGC GGT TAC CCT GGC AAT AIT GGT CCC GTT GGT GCT GCA GGT 

His Lys Gly Glu Arg Gly Tyr Pro Gly Asn lie Gly Pro Val Gly Ala Ala Gly 

2655 2664 2673 2682 2691 2700 

GCA CCT GGT CCT CAT GGC CCC GTG GGT CCT GCT GGC AAA CAT GGA AAC CGT GGT 

Ala Pro Gly Pro His Gly Pro Val Gly Pro Ala Gly Lys His Gly Asn Arg Gly 

2709 2718 2727 2736 2745 2754 

GAA ACT GGT OCT TCT GGT CCT GTT GG? OCT GCT GGT GCT GTT GGC OCA AGA GGT 

Glu Thr Giv Pro Ser Gly Pro Val Gly Pro Ala Gly Ala Val Gly Pro Arg Gly 

2763 2772 2731 279-3 2799 2808 
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OCT AGT GGC CCA CAA GGC 

Pro Ser Gly Pro Gin Gly 

2817 2826 
OCC AGA GGT CTT OCT GGC 



ATT OGT GGC GAT AAG GGA GAG COC GGT GAA AAG GGG 

lie Arg Gly Asp Lys Gly Glu Pro Gly Glu Lys Gly 

2835 2844 2853 2862 

TTC AAG GGA CAC AAT GGA TTG CAA GGT CTG OCT GGT 



Pro Arg Gly Leu Pro Gly Phe Lys Gly His Asa Gly Leu Gin Gly Leu Pro Gly 



2871 2880 
ATC GCT GGT CAC CAT GGT 



2889 2898 2907 2916 

GAT CAA GGT GCT OCT GGC TCC GTG GGT OCT GCT GGT 



lie Ala Gly His His Gly Asp Glu Gly Ala Pro Gly Ser Val Gly Pro Ala Gly 



2925 2934 
OCT AGG GGC OCT GCT GGT 



2943 2952 2961 2970 

OCT TCT GGC OCT GCT GGA AAA GAT GGT OGC ACT GGA 



Pro Arg Gly Pro Ala Gly Pro Ser Gly Pro Ala Gly Lys Asp Gly Arg Thr Gly 



2979 2988 
CAT OCT GGT ACG GTT GGA 



2997 3006 3015 3024 

OCT GCT GGC ATT QGA GGC OCT CAG GGT CAC CAA GGC 



His Pro Gly Thr Val Gly Pro Ala Gly He Arg Gly Pro .Gin Gly His Gin Gly 



3033 3042 
OCT GCT GGC CCC OCT GGT 



3051 3060 3069 3078 

OCC OCT GGC OCT CTT GGA OCT CTA GGT GTA AGC GGT 



Pro Ala Gly Pro Pro Gly Pro Pro Gly Pro Leu Gly Pro Leu Gly Val Ser Gly 



3087 3096 
GGT GGT TAT GAC TTT GGT 



3105 3114 
TAC GAT GGA GAC TTC TAC AGG GCT 3 • 



Gly Gly Tyr Asp Phe Gly Tyr Asp Gly Asp Phe Tyr Arg Ala 
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9 18 27 36 45 54 

CAG TAC GAC GGT AAA GGC GTA GGC CTG GGT CCG GGT CCG ATG GGC CTG ATG GGT 

Gin Tyr Asp Gly lys Gly Val Gly Leu Gly Pro Gly 'Pro Met Gly Leu Met Gly 

53 72 81 90 99 108 

CCA CGT GGC CCA CCG GGT GCA.GCA GGT GCG CCG GGT CCG CAG GGC TTC CAA GGT 

Pro Arg Gly Pro Pro Gly Ala Ala Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly 

117 126 135 144 153 162 

CCG GCG GGT GAA CCG GGC GAA CCG GGT CAG ACG GGT CCG GCG GGT GCT CGC GGT 

Pro Ala Gly Glu Pro Gly Glu Pro Gly Gin Thr Gly Pro Ala Gly Ala Arg Gly 

171 180 189 198 207 216 

CCG GCT GGC CCA CCG GGC AAA GCT GGC GAA GAC GGT CAC CCG GGT AAG CCA GGC 

Pro Ala Gly Pro Pro Gly Lys Ala Gly Glu Asp Gly His Pro Gly Lys Pro Gly 

225 ' 234 243 252 . 261 270 

OGC CCG GGC GAA CGT GGC GTC GTG GGT CCG CAA GGT GCG OGT GGT TTC CCG GGC 

Arg Pro Gly Glu Arg Gly Val Val Gly Pro Gin Gly Ala Arg Gly Phe Pro Gly 

279 288 297 306 315 324 

ACG CCG GGT CTG CCG GGT TTC AAA GGC ATT CGT GGT CAC AAC. GGT CTG GAC GGT 

Thr Pro Gly Leu Pro Gly Phe lys Gly lie Arg Gly' His Asn Gly Leu Asp Gly 

333 342 351 360 369 378 

CTG AAA GGC CAA CCG GGT GCT CCG GGC GTC AAA GGC GAA CCG GGT GCC CCA GGC 

Leu lys Gly Gin Pro Gly Ala Pro Gly Val Lys Gly Glu Pro Gly Ala Pro Gly 

387 396 405 414 423 432 

GAA AAC GGT ACG CCG GGC CAG ACT GGT GCG OGT GGT CTG CCG GGT GAA CGC GGC 

Glu Asn Gly Thr Pro Gly Gin Thr Gly Ala Arg Gly Leu Pro Gly Glu Arg Gly 

441 450 459 468 477 486 

CGT GTT GGC GCT CCG GGT CCG GCT GGC GCG CGT GGC AGC GAT GGC TCC GTC GGT 

Arg Val, Gly Ala Pro Gly Pro Ala Gly Ala Arg Gly Ser Asp Gly Ser Val Gly 

495 504 513 522 531 540 

CCG GTT GGC OCT GCG GGT CCG ATT GGT TCC GCT GGC CCT CCG GGT TTC CCG GGT 

Pro Val Gly Pro Ala Gly Pro lie Gly Ser Ala Gly Pro Pro Gly Phe Pro Gly 

549 558 567 576 585 594 

GCG CCG GGT CCG AAG GGT GAG ATC GGC GCG GTT GGC AAC GCA GGC CCG GCT GGT 

Ala Pro Gly Pro Lys Gly Glu lie Gly Ala Val Gly Asn Ala Gly Pro Ala Gly 

603 612 621 630 639 648 

CCA GCC GGC CCT CGT GGC GAA GTC GGT CTG CCG GGT CTG AGC GGT CCG GTA GGC 

Pro Ala Gly Pro Arg Gly Glu Val Gly Leu Pro Gly Leu Ser Gly Pro Val Gly 
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657 666 675 684 693 702 

CCA. CCG GGT AAC CCG GGC GCA AAC GGC CTG ACG GGT GCA AAA GGT GCG GCT GGC 

Pro Pro Gly Asn Pro Gly Ala Asn Gly Leu Thr Gly Ala Lys Gly Ala Ala Gly 

7U 720 729 738 747 756 

CTC CCG GGC GTT GCC GGT GCC CCG GGC CTG CCG GGT CCG CGC GGT ATT CCG GGT 

Leu Pro Gly Val Ala Gly Ala Pro Gly Leu Pro Gly Pro Arg Gly He Pro Gly 

765 774 783 792 801 810 

CCG CTA GGC GCA GCC GGT GCA ACT GGT GCC CGT GGC CTG GTT GGC GAA CCG GGT 

Pro Val Gly Ala Ala Gly Ala Thr Gly Ala Arg Gly Leu Val Gly Glu Pro Gly 

819 828 837 846 855 864 

CCG GCG GGT TCT AAA GGC GAA AGC GGT AAC AAA GGT GAG CCG GGT. TCC GCG GGC 

Pro Ala Gly Ser Lys Gly Glu Ser Gly Asn Lys Gly Glu Pro Gly Ser Ala Gly 

873 882 891 900 909 918 

CCG CAG GGT CCG CCG GGT CCG AGC GGC GAA GAA GGT AAA CGT GGT CCG AAC GGC 

Pro Gin Gly Pro Pro Gly Pro Ser Gly Glu Glu Gly lys Arg Gly Pro Asn Gly 

927 936 945 954 963 972 

GAG GCT GGT TCC GCA GGC OCT CCG GGT COG CCG GGT CTG CGT GGC AGC COG GGT 

Glu Ala Gly Ser Ala Gly Pro Pro Gly Pro Pro Gly Leu Arg Gly Ser Pro Gly 

981 990 . 999 1008 1017 1026 

AGC CGT GGC CTG CCG GGC GCG GAC GGC CGT GCG GGC CTG ATG GGT CCG CCG GGT 

Ser Arg Gly Leu Pro Gly Ala Asp Gly Arg Ala Gly Val Met Gly Pro Pro Gly 

1035 1044 1053 1062 1071 1080 

TCC CGT GGT GCC TCT GGT CCG GCT GGT GTC CGT GGT CCG AAT GGC GAC GOG GGC 

Ser Arg Gly Ala Ser Gly Pro Ala Gly Val Arg Gly Pro Asn Gly Asp Ala Gly 

1089 1098 1107 1116 1125 1134 

CGT CCG GGT GAA CCG GGC CTG ATG GGT CCG CGT GGC CTG CCG GGT AGC CCG GGT 

Arg Pro Gly Glu Pro Gly Leu Met Gly Pro Arg Gly Leu Pro Gly Ser Pro Gly 

1143 1152 1161 1170 1179 1188 

AAC ATT GGT CCG GCG GGT AAG GAG GGT COG GTA GGT CTG CCG GGT ATT GAT GGT 

Asn He Gly Pro Ala Gly lys Glu Gly Pro Val Gly Leu Pro Gly lie Asp Gly 

1197 1206 1215 1224 1233 1242 

CGT CCG GCT CCG ATC GGC CCT GCG GGC GCT CGT GGC GAG CCG GGT AAC ATC GGT 

Arg Pro Gly Pro lie Gly Pro Ala Gly Ala Arg Gly Glu Pro Gly Asn He Gly 

1251 1260 1269 1278 1287 1296" 

ITT CCG GGT CCG AAG GGT CCG ACG GGC GAC CCG GGC AAG AAC GGT GAT AAA GGC 

Phe Pro Gly Pro Lys^ Gly Pro Thr Gly Asp Pro Gly Lys Asn Gly Asp Lys Gly 



1305 1314 
CAT GCA GGT CTG GCA GGT GCC 



1323 
CGT GGT 



1332 
GCA CCG GGT 



1341 
CCG GAT GGT 



1350 
AAC AAT GGT 



His Ala Gly Leu Ala Gly Ala Arg Gly Ala Pro Gly Pro Asp Gly Asn Asn Gly 
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1359 1368 1377 1386 1395 1404 

GCG CAG GGT CCG CCG GGT CCG CAG GGC GTA CAG GGT GGC AAA GGT GAA CAG GGT 

Ala Gin Gly Pro Pro Gly Pro Gin Gly Val Gin Gly Gly Lys Gly Glu Gin Gly 

1413 1422 1431 1440 1449 1458 

CCG GCA GGC CCA OCG GGC TTC CAG GGT CTG CCG GGT CCG AGC GGC CCG GCT GGT 

Pro Ala Gly Pro Pro Gly Phe Gin Gly Leu Pro Gly Pro Ser Gly Pro Ala Gly 

1467 1476 1485 1494 1503 1512 
GAA GTG GGC AAA COG GGC GAA CGT GGC CTC.CAT GGC GAG TTT GGC CTG CCG GGT 

Glu Val Gly Lys Pro Gly Glu Arg Gly Leu His Gly Glu Phe Gly Leu Pro Gly 

1521 1530 1539 1548 1557 1566 

CCG GCC GGT CCG CGT GGT GAG CGC GGC CCT CCG GGC GAA TCC GGC GCG GCA GGT 

Pro Ala Gly Pro Arg Gly Glu Arg Gly Pro Pro Gly Glu Ser Gly Ala Ala Gly 

1575 1584 1593 1602 1611 1620 

CCG ACC GGC CCG ATT GGT TCC CGT GGT CCG AGC GGC CCA CCG GGT CCG GAC GGC 

Pro Thr Gly Pro He Gly Ser Arg Gly Pro Ser Gly Pro Pro Gly Pro Asp Gly 

1629 4 * 1638 1647 1656 1665 1674 

AAC AAA GGC GAG OCG GGT GTT GTT GGT GCT GTT GGT ACC GCC GGC COG TCT GGT 

Asn Lys Gly Glu Pro Gly Val Val Gly Ala Val Gly Thr Ala Gly Pro Ser Gly 

1683 1692 1701 1710 1719 1728 

CCG AGC GGT CTG CCG GGC GAA CGC GGT GCC GCT GGT ATT CCG GGC GGC AAA GGT 

Pro Ser Gly Leu Pro Gly Glu Arg Gly Ala Ala Gly lie Pro Gly Gly Lys Gly 

1737 1746 1755 1764 1773 1782 

GAA AAA GGT GAA CCG GGT CTG CGC GGT GAG ATT GGC AAC CCG GGC CGT GAC GGT 

Glu lys Gly Glu Pro Gly Leu Arg Gly Glu lie Gly Asn Pro Gly Arg Asp Gly 

1791 1800 1809 1818 1827 1836 
GCT CGC GGT GCA CAC GGC GCG GTT GGC GCA COG GGT OCG GCA GGC GCG ACT GGT 

Ala Arg Gly Ala His Gly Ala Val Gly Ala Pro Gly Pro Ala Gly Ala Thr Gly 

1845 1854. 1863 1872 1881 1890 

GAT CGT GGC GAA GCT GGT GCA GOG GGT CCG GCG GGT CCG GCC GGC OCT CGC GGT 

Asp Arg G\y Glu Ala Gly Ala Ala Gly Pro Ala Gly Pro Ala Gly Pro Arg Gly 

1899 1908 1917 1926 1935 1944 

TCC CCG GGC GAA CGC GGC GAA GTC GGC CCG GCT GGC CCG AAT GGC TTT GCT GGC 

Ser Pro Gly Glu Arg Gly Glu Val Gly Pro Ala Gly Pro Asn Gly Phe Ala Gly 

1953 1962 1971 1980 1989 1998 

CCA GCG GGC GCT GCG GGC CAA CCG GGT GCG AAA GGT GAG CGC GGT GCC AAA GGC 

Pro Ala Gly Ala Ala Gly Gin Pro Gly Ala Lys Gly Glu Arg Gly Ala Lys Gly 

2007 2016 2025 2034 2043 2052 

OCG AAA GGT GAA AAT GGT GTA GTT GGT CCG ACG GGT CCG GTT GGT GCG GCT GGT 

Pro Lys Gly Glu Asn Gly Val Val Gly Pro Thr Gly Pro Val Gly Ala Ala Gly 
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7061 2070 2079 2088 2097 2106 

CCG GCT~*GGC CCG AAT GGT CCG CCG GGT CCG GCA GGC AGC CGT GGC GAT GGT GGC 

Pro Ala Gly Pro Asn Gly Pro Pro Gly Pro Ala Gly Set Arg Gly Asp Gly Gly 

2ii5 2124 2133 2142 2151 2160 

CCAttSCX^ATCACCGCTTICCCTGCrGCGGCC 

pro Pro Gly Met Thr Gly Phe Pro Gly Ala Ala Gly Arg Thr Gly Pro Pro Gly 

2169 2178 2187 2196 2205 2214 

CCG TCT GGC ATT TCT GGC CCA CCG GGT QCG CCG GGT CCG GCG GGC AAA GAA GGT 

Pro Ser Gly He Ser Gly Pro Pro Gly Pro Pro Gly Pro Ala Gly Lys Glu Gly 

2223 2232 2241 2250 2259 2268 

CTG COT GGC CCA CGC GGC GAC CAG GGT CCG GTG GGC CGT ACC GGC GAA GTC GGT 

Leu Arg Gly Pro Arg Gly Asp Gin Gly Pro Val Gly Arg Thr Gly Glu Val Gly 

2277 2286 2295 2304 2313 2322 

GCT GTT GGC CCT CCG GGC TTT GOG GGT GAG AAA GGT CCG AGC GGT GAA GCT GGC 

Ala val Gly Pro Pro Gly Phe Ala Gly Glu lys Gly Pro Ser Gly Glu Ala Gly 

2331 % 2340 2349 2358 ■ 2367 2376 

ACCGC^GGCCXGCCGGGTACGOXSGGTCCGC^ 

Thr Ala Gly Pro Pro Gly Thr Pro Gly Pro Gin Gly Leu Leu Gly Ala Pro Gly 

2385 2394 2403 2412 2421 2430 

ATC CTG GGC CTG COG GGC TCC QGT GGC GAA CGC GGT CTG CCG GGC GTT GCA GGC 

He Leu Gly Leu Pro Gly Ser Arg Gly Glu Arg Gly Leu Pro Gly Val Ala Gly 

2439 2448 2457 2466 2475 2484 

GCT GTA GGC GAA CCG GGT CCG CTG GGT ATC GCG GGT OCG CCG GGT GCG CGT GGT 

Ala Val Gly Glu Pro Gly Pro Leu Gly He Ala Gly Pro Pro Gly Ala Arg Gly 

2493 2502 2511 2520 2529 2538 

COG CCG GGT GCC GTG GGC TCT CCG GGT GTT AAC GGC GCC CCT GGT GAA GCG GGC 

Pro Pro Gly Ala Val Gly Ser Pro G]y Val Asn GJy Ala Pro Gly Glu Ala Gly 

2547 2556 2565 2574 2583 2592 

CGC GAC GGC AAT COG GGC AAC GAT GGT COG CCG GGT CGT GAT GGT CAG COG GGT 

Arg Asp Gly Asn Pro Gly Asn Asp Gly Pro Pro Gly Arg Asp Gly Gin Pro Gly 

2601 2610 2619 2628 2637 2646 

CAC AAA GGT GAG CGT GGC TAC CCG GGT AAC ATC GGT CCG GTT GGT GCG GCC GGC 

His Lys Gly Glu Arg Gly Tyr Pro Gly Asn He Gly Pro Val Gly Ala Ala Gly 

2655 2664 2673 2682 2691 2700 

GCT CCG GGT CCG CAC GGT CCG GTA GGC CCA GCC GGC AAA CAC GGT AAC CGT GGT 

Ala Pro Gly Pro His, Gly Pro Val Gly Pro Ala Gly Lys His Gly Asn Arg Gly 

2709 2718 2727 2736 2745 2754 

GAA ACG GGT CCG TCC GGT CCG GTA GGT OCG GCG GGT GCT GTT GGT CCA CGC GGC 

Glu Thr Gly Pro Ser Gly Pro. Val Gly Pro Ala Gly Ala Val Gly Pro Arg Gly 
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2763 27*/. 
CCG TCC GGC CCG CA.G GGT 



2781 2790 .1799 280U 

ATT CGC GGT GAC AAA GGC GAA CCG GGC GAA AAA GGT 



Pro Ser Gly Pro Gin Gly lie Arg Gly Asp Lys Gly Glu Pro Gly Glu Lys Cly 



2817 2826 
CCG CGT GGT CTG CCG GGC 



2835 2844 2853 2862 

CTT AAG GGC CAC AAC GGT CTG CAA GGT CTG CCG GGT 



Pro Arg Gly Leu Pro Gly Leu lys Gly His Asn Gly- Leu Gin Gly Leu Pro Gly 



2871 
ATC GCG GGT CAC 



2880 2889 2898 2907 2916 

CAC GGT GAT CAG GGT GCT CCG GGT TCC GTT GGT CCG GCC GGT 



lie Ala Gly His His Gly Asp Gin Gly Ala Pro Gly Ser Val Gly Pro Ala Gly 



2925 
COG CGT GGC CCG 



2934 2943 2952 2961 2970 

GCT GGT CCG TCT GGT CCG GCC GCT AAA GAC GGC CGT ACG GGC 



Pro Arg Gly Pro Ala Gly Pro Ser Gly Pro Ala Gly lys Asp Gly Arg Thr Gly 



2979 
CAC CCG GGT ACG 



2988 2997 3006 3015 3024 

GTG GGT CCG GCC GGC A2T CGC GGT CCG CAA GGT CAC CAG GGT 



His Pro Gly Thr val Gly Pro Ala Gly lie Arg Gly Pro Gin Gly His Gin Gly 



3033 
CCG GCG GGT CCG 



3042 3051 3060 3069 

(XGGCTCXGOC»<Xn , OCG<X^GCTCCGCCGGGT 



3078 
GTT AGC GGT 



Pro Ala Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Val Ser Gly 

3087 
GGC GGT TAT GAT 



3096 3105 3114 

TTT GGT TAT GAC GGT GAT TTC TAT CGT GCG 3' 



Gly Gly Tyr Asp Phe Gly Tyr Asp Gly Asp Phe Tyr Arg Ala 



FIG. 50E 
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C3 



4 
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117 base pair fragment of human 
Type I ( a 2 ) with optimized codon usage 



FIG. 52 
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240 base pair fragment of human 
Type I (a 2 ) with optimized codon usage 



FIG. 53 
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9 18 27 36 45 54 

CAG TAT GAT GGC AAA GGC GTC GGC CIC GGC OOG GGC OCA AIG GGC CTC ATG GGC 



Gin Tyr Asp Gly Lys Gly Val Gly Leu Gly Pro Gly Pro Ifet Gly Leu Met Gly 

63 72 81 90 99 108 

(XGCGCC^CCAOOGGGTCX^GCTGGCGOCa^GGCCXS 

Pro Axg Gly Pro Pro Gly Ala Ala Gly Ala Pro Gly Pro Gin Gly Phe Gin Gly 

117 126 135 144 153 162 

OCT GCC GGT GAG OCG GGT GAA OOG GGC CAA ACG GCT OOG OCA GGT CCA CGT GGT 



Pro Ala Gly Glu Pro Gly Glu Pro Gly Gin Thr Gly Pro Ala Gly Ala Arg Gly 

.171 180 189 198 207 216 

CCA GCG GGC OOG OCT GGC AAG GCG GGT GAA G^T GGC CAC OCT GGC AAA COG GGC 

Pro Ala Gly Pro Pro Gly Lys Ala Gly Glu Asp Gly His Pro <5 Ly's to Gl^ 

225 234 
OGC OOG GGT GIG OGT GGC GTA GIG 

Arg Pro Gly Glu Arg Gly Val Val 



FIG. 54. 
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Human collagen Tyjpe I (a 2 ) 
with optimized codon usage 



FIG. 55 
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240 base pair fragment of human 
Type I ( a 2 ) with optimized codon usage 



FIG. 56 
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FIG. 58 
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GST-Coll 

+Hyp titod +IPTC +IPTG 
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Gene for glutathione S -transferase 
fused to gene for collagen mimetic 4 



FIG. 60 
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I 9 18 27- 36 45 54 

5 1 ATG GGG CTC GCT GGC CCA CCG GGC GAA CCG GGT CCG CCA GGC CCG AAA GGT CCG 

M -G L A G P P G EP G P P G PK GP 



CGT GGC 



63 

GAT AGC GGG 



72 

CTC GCT GGC 



81 

CCA CCG GGC 



90 
GAA CCG 



99 

GGT CCG CCA 



108 
GGC CCG 



R G 



AAA GGT 



117 

CCG CGT GGC 



126 

GAT AGC GGG 



135 

CTC GCT GGC 



144 

CCA CCG 



153 
GGC GAA CCG 



162 
GGT CCG 



K 



CCA GGC 



171 

CCG AAA GGT 



180 

CCG CGT GGC 



189 

GAT AGC GGG 



198 

CTC GCT 



207 
GGC CCA CCG 



216 
GGC GAA 



K 



CCG GGT 
P G 



225 

CCG CCA GGC 



234 

CCG AAA GGT 



243 

CCG CGT GGC 



252 

GAT AGC 



261 
GGG CTC CCG 



' • * 270 
GGC GAT 



K 



TCC TAA 3' 

s ;* 
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CM lO U"> O 
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Gene for 219 amino acid C-terminal 
fragment of Type I (<xl) human collagen 
with optimized E. colt codon usage fused 
to gene for glutathione S-iransf erase 



FIG. 67 
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Gene for 207 amino acid C-termiml 
fragment of Type I (a 2) human coUagen 
with optimized E. eolx codon usage fused 
to gene for glutathione S- transferase 



FIG. 68 
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Protein Sequence of the First 13 amino acids of D4-al. Predicted From the DNA Sequence: 
HjN-GIy-Pro-Pro-Gly-Leu-Ala-Gly-Pro-Pro-Gly-Glu-Ser-Gly 

Experimentally-Determined Protein Sequence of the First 13 amino acids of D4-al: 
HjN-Gly-Hyp-Hyp.Gly.Leu-Ala-GIy-Hyp-Hyp-Gly-Glu-Ser-Gly 
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5 IS 27 36 45 54 

atg go: CCC. CCG CGT CTC CCG GGC CCT CCG GGT GAA AGO GGT CGT GAA GGC GCG CCG GGT 

69 78 87 " 96 105 114 

GCC GAA CGC AGC CCA &X CGC GAC GGT AGC CCG GGG *GCC AAA GGG GAT CGT GGT GAA ACC 

129 139 147 156 165 174 

GGC CCG GCC (>GC CCC CCG GGT GCA CCG GGC CCG CCG GGT GCC CCA GGC CCG GTG GGC CCG 



139 13* 207 216 225 234 

GCS GGC AAA AGC GGT GAT CGT GGT GAG ACC GGT CCG GCG GGC CCG GCC GGT CCG GTG GGC 

249 253 267 276 285 294 

CCA GCG GGC GGC CGT GGC CCG GCC GGT CCG CAG GGC CCG CGG GGT GAC AAA GGT GAA ACG 

309 313 327 336 345 354 

GGC GAA CAG GGC GAC CGT GGC ATT AAA GGC CAC CGT GGC TTC AGC GGC CTG CAG GGT CCA 

369 378 3S7 396 405 414 

CCG GGC CCG CCG GGC ACT CCG GGT GAA CAG GGT CCG TCC GGA GCC AGC GGG CCG GCG GGC 

429 , 433 447 456 465 474 

CCA CGC GGT CCG CCG GGC AGC GCG GGC GCG CCG GGC AAA GAC GGT CTC AAC GGT CTG CCG 

489 433 507 515 525 534 

GCC CCG ATC GGC CCG CCG GGC CCA CGC CGC CGC ACC GGT GAT GCG GGT CCG GTG GGT CCC 

549 553 567 576 585 594 

CCG GGC CCG CCG GGC CCG CCA GGC CCG CCG GGA CCG CCG AGC GCG GGT TTC GAC TTC AGC 

609 613 627 636 645 654 

TTC CTC CCG CAG CCG CCG CAG GAG AAA GCG CAC GAC GGC GGT CGC TAC TAC CCT GCG TAA 
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714 
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10 20 30 40 50 60 

MGPFGLAGPP GLSGRFGAPG ACGSTGSCCS PGAKGORGET GPAGPPGAPG APCAPGPVG? 

70 SO 50 100 ' 110 120 

AGKSGDW3ET GFAGPAGPVG PACASGPAGTP QGPSGDKCET GEQClDFtGIKC HP.GFSGLQGP 

130 140 150 160 170 180 

FGPPG5PCEQ CPSGASGPAG PRGPPGSAGA PGKDGLNGLP GPIGPPGPRG RTGDAGPVGP 

190 200 210 220 230 . 240 

PGPPGPPGPP GPPSAGFOFS FLFQPFQEKA HCCGRYYRA* 
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FIG. 73 
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Gene /or 70kDA 




FIG. 75 
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Gem /or 70kDA 
Fragment of Human 
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Gene for 70kDA 
Fragment of Human 
Fibrmctin Fusion 
with Gem for Human 
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Gene for Type I (al) 
Human Collagen with, 
Optimized E. colt 
Codon Usage Fusion 
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atg<3gctctccgggtgttaacg<}cgcccctggtgaagcgggccgcgacggcaatccggg 

caacgatggtccgccgggtcgtgatggtcagccgggtcacaaaggtgagcgtggctacc 

cgggtaacatcggtccggttggtgcggccggcgctccgggtccgcacggtccggtaggc 

ccagccggcaaacacggtaaccgtggtgaaacgggtccgtccggtccggtaggtccggc 

gggtgctgttggtccacgcggcccgtccxigcccgcagggtattcgcggtgacaaaggcg 

aaccgggcgaaaaaggtccgcgtggtctgccgggccttaagggccacaacggtqtgcaa 

ggtctgccgggtatcgcgggtcaccacggtgatcagggtgctccgggttccgttggtccg 

gccggtccgcgtggcccggctggtccgtctggtccggccggtaaagacggccgtacggg 

ccacccgggtacggtgggtccggccggcattcgcggtccgcaaggtgaccagggtccgg 

cgggtccgccgggtccgccgggtccgccgggtccgccgggtgttagcggtggcggttat 

gattttggttatgacggtgatttctatcgtgcgtaa 
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MGPPGIAGPPGESGREGAPGAEGSPGRDGSPGAKGDRGETGPAGPPGAPGAPGAPGPVGPA 
GKSGDRGETGPAGPAGPVGPAGARGPAGPQGPRGDKGETGEQGDRGKGHRGFSGLQGPPG 
PPGSPGEQGPSGASGPAGPRGPPGSAGAPGKDGLNGLPGPIGPPGPRGRTGDAGPVGPPGPPG 
PPGPPGPPSAGFDFSFLPQPPQEKAHDGGRYYRA 
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01igoN4-l 

5'GGAATTCTCCCATGGGCCCGCCGGGTCTGGCGCKjCCCTCCGGGTGAAAGCGGTCGTGA 

aggcgcgccgggtgccgaaggcagcccaggccgcgac 



Oligo N4-2 

3'CTTCCGTCGGGTCCGGCGCTGCCATCGGGCCeCCGGTTTCCCCTAGCACCACTTTGGCC 
GGGCCGCCCGGGGGGCCCACGTGGCATTATTCGAACCC 



Oligo N4-3 

5'GGAATTCGGTG€ACCGGGCGCGCCGGGTGCCCCAGGCGCGGTGGGCCCGGCGGGCAAA 
AGCGGTGATCGTGGCGAGACCGGTCCGGCGGGC 



Oligo N4-4 

S'CTCTGGCCAGGCCGCCCGGGCCGGCGAGGCCACCCGGGTCGCCCGCGGGCACCGGGCC 
GGCCAGGCGTCCCGGGCGCCATTATTCGAACCC 



FIG. 81 



260 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBITS) SUBMITTED ARE POOR QUALITY 

□ OTHER: . ■ 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



